Skip to main content

Command Palette

Search for a command to run...

Generative AI Notes

Published
4 min read
Generative AI Notes
A

Backend Engineer (ML in Progress) 📉 | Learning in Public | Systems, APIs, Architecture


1. What is Generative AI (Gen AI)?

Generative AI (Gen AI) is a type of AI that can create new content – like text, images, audio, or video.

It learns from existing data and produces something new, similar to how humans create.

Examples:

  • ChatGPT → writes text

  • DALL·E → creates images

Example Scenario:
You type:

"Write a short poem about cats."

AI generates a brand new poem by itself.


2. What is ChatGPT?

ChatGPT is an AI chatbot developed by OpenAI.
It is based on GPT (Generative Pretrained Transformer) models.

How it works:

  1. You type text (prompt).

  2. Transformer model understands the context of your input.

  3. Predicts the next word repeatedly to form a complete reply.

Example:
Input:

"Hello, how are you?"

Output:

"I'm good! How about you?"

3. Google Article – “Attention is All You Need”

  • Published by Google in 2017.

  • Introduced Transformer architecture, replacing old RNNs & LSTMs.

  • Core Idea: Self-Attention → model decides which words are important without reading sequentially.

Foundation: GPT, BERT, etc.

Example:
Sentence: "The cat sat on the mat"

  • Word "cat" pays more attention to "sat" → action is important for meaning.

4. What is a Transformer & How it Predicts the Next Word

Transformer is a model that uses self-attention to understand relationships between words.

Steps:

  1. Input sequence (tokens)

  2. Apply embeddings + self-attention + feed-forward layers

  3. Output → probability distribution of next word (via Softmax)


5. How GPT Generates the Next Word

  1. Input text → tokenized into embeddings

  2. Passed through Transformer layers

  3. Output layer gives probabilities for all vocabulary words

  4. Model picks the word with highest probability (or samples)

  5. Repeat until sentence or stopping condition is reached

Example:
Input: "I am feeling"
Probabilities:

"happy"0.6
"sad"0.3
"hungry"0.1

GPT picks "happy" → Output: "I am feeling happy"


6. What are Input Tokens?

Tokens = smallest units of input text (word, subword, or character).

Example:
Text: "ChatGPT is cool"
Tokens → [Chat, G, PT, is, cool]


7. What is Input Sequence?

Ordered list of tokens fed into the model.

Example:
Text: "I love AI"
Input sequence → [I, love, AI]


8. Vocabulary, Encoding, Decoding & AI Model

  • Vocabulary (Vocab): Set of all tokens the model knows

  • Encoding: Converts text → tokens/IDs

  • Decoding: Converts tokens/IDs → text

  • AI Model: Neural network (like GPT) that processes input and generates output


9. Transformer Architecture

Two main parts:

  • Encoder: Used in BERT, translation tasks

  • Decoder: Used in GPT, text generation

Core components:

  • Input Embedding

  • Positional Encoding

  • Self-Attention

  • Multi-Head Attention

  • Feed Forward Network

  • Output Layer


10. What is Tokenizer?

Tokenizer splits text into tokens and maps them to IDs the model can understand.

Example:
Text: "Hello" → Token ID [15496]


11. What is Input Embedding?

Converts tokens into numerical vectors that represent meaning & context.


12. What is Positional Embedding?

Adds position information (word order) because Transformers do not read text sequentially.

Example:

"I love AI""AI love I"

Positional encoding ensures the model understands correct word order.


13. What is Self-Attention?

Each word looks at all other words to decide importance.

Example:
Sentence: "The cat sat on the mat"

  • "cat" focuses more on "sat" (the main action).

14. What is Multi-Head Self-Attention?

Runs multiple attentions in parallel (heads).

  • Each head learns different relationships → syntax, meaning, context

Example:
Sentence: "The cat sat on the mat"

  • Head 1 → Subject-action relationship

  • Head 2 → Word positions

  • Head 3 → Contextual relationships


15. Transformer Phases (Training & Inference)

  • Training: Model learns patterns from large datasets using backpropagation and gradient descent

  • Inference: Model uses trained knowledge to generate outputs (like ChatGPT replying)

Example:

  • Training: Reads millions of sentences → learns patterns

  • Inference: You type "I am feeling" → model predicts next word "happy"


16. What is Softmax Function?

Converts raw scores (logits) → probabilities that sum to 1.
Used to pick the next word in generation.

Example:

Logits: [2.1, 1.0, 0.1] 
Probabilities: [0.65, 0.24, 0.11]
  • Highest probability → next word selected