Thursday, August 4, 2022

RoBERTa uses learned position embeddings!

 args.encoder_learned_pos = safe_getattr(args, "encoder_learned_pos", True)

So...I need to find a variation that uses sinusoidal position embeddings to test long.

Transformer Language Models without Positional Encodings Still Learn Positional Information is the only preprint/paper mentioning MLM ALiBi so far?

No comments: