Generative AI Notes

Backend Engineer (ML in Progress) 📉 | Learning in Public | Systems, APIs, Architecture
1. What is Generative AI (Gen AI)?
Generative AI (Gen AI) is a type of AI that can create new content – like text, images, audio, or video.
It learns from existing data and produces something new, similar to how humans create.
Examples:
ChatGPT→ writes textDALL·E→ creates images
Example Scenario:
You type:
"Write a short poem about cats."
AI generates a brand new poem by itself.
2. What is ChatGPT?
ChatGPT is an AI chatbot developed by OpenAI.
It is based on GPT (Generative Pretrained Transformer) models.
How it works:
You type text (prompt).
Transformer model understands the context of your input.
Predicts the next word repeatedly to form a complete reply.
Example:
Input:
"Hello, how are you?"
Output:
"I'm good! How about you?"
3. Google Article – “Attention is All You Need”
Published by Google in 2017.
Introduced Transformer architecture, replacing old RNNs & LSTMs.
Core Idea: Self-Attention → model decides which words are important without reading sequentially.
Foundation: GPT, BERT, etc.
Example:
Sentence: "The cat sat on the mat"
- Word
"cat"pays more attention to"sat"→ action is important for meaning.
4. What is a Transformer & How it Predicts the Next Word
Transformer is a model that uses self-attention to understand relationships between words.
Steps:
Input sequence (tokens)
Apply embeddings + self-attention + feed-forward layers
Output → probability distribution of next word (via Softmax)
5. How GPT Generates the Next Word
Input text → tokenized into embeddings
Passed through Transformer layers
Output layer gives probabilities for all vocabulary words
Model picks the word with highest probability (or samples)
Repeat until sentence or stopping condition is reached
Example:
Input: "I am feeling"
Probabilities:
"happy" → 0.6
"sad" → 0.3
"hungry" → 0.1
GPT picks "happy" → Output: "I am feeling happy"
6. What are Input Tokens?
Tokens = smallest units of input text (word, subword, or character).
Example:
Text: "ChatGPT is cool"
Tokens → [Chat, G, PT, is, cool]
7. What is Input Sequence?
Ordered list of tokens fed into the model.
Example:
Text: "I love AI"
Input sequence → [I, love, AI]
8. Vocabulary, Encoding, Decoding & AI Model
Vocabulary (Vocab): Set of all tokens the model knows
Encoding: Converts text → tokens/IDs
Decoding: Converts tokens/IDs → text
AI Model: Neural network (like GPT) that processes input and generates output
9. Transformer Architecture
Two main parts:
Encoder: Used in BERT, translation tasks
Decoder: Used in GPT, text generation
Core components:
Input Embedding
Positional Encoding
Self-Attention
Multi-Head Attention
Feed Forward Network
Output Layer
10. What is Tokenizer?
Tokenizer splits text into tokens and maps them to IDs the model can understand.
Example:
Text: "Hello" → Token ID [15496]
11. What is Input Embedding?
Converts tokens into numerical vectors that represent meaning & context.
12. What is Positional Embedding?
Adds position information (word order) because Transformers do not read text sequentially.
Example:
"I love AI" ≠ "AI love I"
Positional encoding ensures the model understands correct word order.
13. What is Self-Attention?
Each word looks at all other words to decide importance.
Example:
Sentence: "The cat sat on the mat"
"cat"focuses more on"sat"(the main action).
14. What is Multi-Head Self-Attention?
Runs multiple attentions in parallel (heads).
- Each head learns different relationships → syntax, meaning, context
Example:
Sentence: "The cat sat on the mat"
Head 1 → Subject-action relationship
Head 2 → Word positions
Head 3 → Contextual relationships
15. Transformer Phases (Training & Inference)
Training: Model learns patterns from large datasets using backpropagation and gradient descent
Inference: Model uses trained knowledge to generate outputs (like ChatGPT replying)
Example:
Training: Reads millions of sentences → learns patterns
Inference: You type
"I am feeling"→ model predicts next word"happy"
16. What is Softmax Function?
Converts raw scores (logits) → probabilities that sum to 1.
Used to pick the next word in generation.
Example:
Logits: [2.1, 1.0, 0.1]
Probabilities: [0.65, 0.24, 0.11]
- Highest probability → next word selected


