args.encoder_learned_pos = safe_getattr(args, "encoder_learned_pos", True)
So...I need to find a variation that uses sinusoidal position embeddings to test long.
Transformer Language Models without Positional Encodings Still Learn Positional Information is the only preprint/paper mentioning MLM ALiBi so far?
No comments:
Post a Comment