Vectors In Transformer Neural Networks



Published
STRATASCRATCH
For Thousands of Data Science Interview Questions + Solutions, sign up for stratascratch: https://www.stratascratch.com/?via=CodeEmporium


REFERENCES
[1] Why it's okay to add position embeddings: https://randorithms.com/2020/11/17/Adding-Embeddings.html
[2] Main Transformer Paper: https://arxiv.org/abs/1706.03762
[3] Word2Vec Vs Transformers: https://www.quora.com/What-are-the-main-differences-between-the-word-embeddings-of-ELMo-BERT-Word2vec-and-GloVe
[4] Using sub-words in BERT: https://handsonnlpmodelreview.quora.com/Latest-trend-in-input-representation-for-state-of-art-NLP-language-models?ch=10&share=172a7f72
[5] In High Dimensinos, randomly drawn vectors are orthogonal: https://math.stackexchange.com/questions/995623/why-are-randomly-drawn-vectors-nearly-perpendicular-in-high-dimensions
[6] Stackexchage answer on Positional encodings: https://datascience.stackexchange.com/questions/51065/what-is-the-positional-encoding-in-the-transformer-model
[7] Good information on positional encoding: https://kazemnejad.com/blog/transformer_architecture_positional_encoding/
[8] Master Positional Encoding: https://towardsdatascience.com/master-positional-encoding-part-i-63c05d90a0c3
[9] Reddit Thread on Positional Encoding: https://www.reddit.com/r/MachineLearning/comments/cttefo/d_positional_encoding_in_transformer/exs7d08/?utm_source=reddit&utm_medium=web2x&context=3


TIMESTAMPS
0:00 Introduction
0:44 Transformer Architecture
1:54 Data Science Interview Sponsor
3:04 Vectors
5:09 Role of Vectors in Transformers
7:07 Position Encoding
10:35 Multi Head Attention
11:37 Vector Operations: Addition Vs Concatenation
13:42 Beyond Transformers (BERT, Sentence Transformer)
Category
Job
Be the first to comment