Pregunta de entrevista de AMD

How does the self attention layer work in transformers?