A deep dive into absolute, relative, and rotary positional embeddings with code examplesRotary position embedding — Image from [6]One of the key components of transformers are positional embeddings. You may ask: why? Because the self-attention mechanism in transformers is permutation-invariant; that means it computes the amount of `attention` each token in the input receives from other tokens in the sequence, however it does not take the order of the tokens into account. In fact, attention mechanism treats the sequence as a bag of tokens. For this reason, we need to have another component called positional embedding which accounts for the order of tokens and it influences token embeddings. But what are the different types of positional embeddings and how are they implemented?In this post, we take a look at three major types of positional embeddings and dive deep into their implementation.Here is the table of content for this post:1. Context and Background2. Absolute Positional Embedding2.1 Learned Approach2.2 Fixed Approach (Sinusoidal)2.3 Code Example: RoBERTa Implementation