Detailed Notes: RAG, Agents, and Memory in AI Applications

1. Task Solving and Context

Instructions vs. Context:
- Instructions: General, static for the application (how to solve).
- Context: Specific to each query (what info to use).
- Missing context → AI more likely to hallucinate or err.
Two Major Context Patterns:
- RAG (Retrieval-Augmented Generation): Retrieve info from external sources for each query.
- Agents: Use tools (e.g., web search, APIs) to gather/act on information, enabling automation and world interaction.

2. RAG (Retrieval-Augmented Generation)

Definition & Purpose

Enhances generation by retrieving relevant info from external memory (DBs, previous chats, internet, etc).
Origin: Coined in Lewis et al., 2020 for knowledge-intensive NLP tasks.
Why RAG:
- Model context limits: Only relevant info retrieved for each query.
- Helps with user-specific data, reduces hallucination.

Key Concepts

Context Construction: Like feature engineering in classical ML; the data you feed is vital.
Persistent Need: No matter how long model context grows, RAG remains necessary—data always grows faster.

RAG Architecture

Two Main Components:
1. Retriever: Finds info from external memory (via indexing and querying).
2. Generator: Generates response based on retrieved info.
Chunking:
- Documents are split into manageable “chunks”.
- Only the most relevant chunks for a query are retrieved.
- Post-processing combines prompt + retrieved data for final input.

Retrieval Algorithms

Sparse vs. Dense (Term vs. Embedding)

Sparse (Term-based / Lexical) Retrieval:
- Data as sparse vectors (one-hot encoding).
- Fast, strong baselines (BM25, BM25+, BM25F).
- Sensitive to exact term match, may return irrelevant docs due to ambiguity.
- Key Metrics:
- TF (Term Frequency): How often a term appears.
- IDF (Inverse Document Frequency): Weighting rare/informative terms.
- Tokenization: Split text, handle n-grams for multi-word terms.
Dense (Embedding-based / Semantic) Retrieval:
- Data as dense vectors (embeddings).
- Vector databases: Store embeddings, allow nearest-neighbor search.
- More semantic understanding—retrieves documents by meaning.
- Uses ANN (Approximate Nearest Neighbor) algorithms.
- Examples: FAISS, ScaNN, Annoy, Hnswlib.
- Algorithms: LSH, HNSW, Product Quantization, IVF, Annoy.
- Tradeoff: More expensive, slower, but improves with fine-tuning.

Comparison Table

	Term-based	Embedding-based
Query Speed	Much faster	Slower
Performance	Strong baseline, hard to improve; can misfire on ambiguity	Outperforms with fine-tuning, semantic
Cost	Cheap	Expensive (embedding, storage, search)

Hybrid Search

Combine both approaches: Use term-based for initial candidate fetch, embedding-based for re-ranking.

Evaluation Metrics

Context Precision: % of retrieved docs that are relevant.
Context Recall: % of all relevant docs that are retrieved.
Other: NDCG, MAP, MRR, MTEB, BEIR.

Retrieval Optimization Tactics

Chunking Strategy:
- Chunk size/overlap affects retrieval and downstream model performance.
- Smaller chunks = more variety, but more computational overhead and potential loss of context.
- Chunking by tokens matches model limits, but requires reindexing if you change models.
Reranking:
- Further prioritize/reduce candidate docs post-retrieval (e.g., time-based, relevance).
Query Rewriting:
- Reformulate ambiguous/elliptical queries for better retrieval (often with a model).
Contextual Retrieval:
- Augment chunks with metadata, tags, or short AI-generated contexts to aid future retrieval.

RAG Beyond Text

Multimodal RAG:
- Retrieve not just text, but images, audio, video, etc.
- Requires multimodal embedding models (e.g., CLIP).
Tabular Data:
- Requires text-to-SQL, table schema prediction, SQL execution, and result synthesis.

3. Agents

Definition

Agent: Anything that perceives and acts in an environment, defined by:
- Environment: Where the agent operates.
- Actions/Tools: What the agent can do.
Examples:
- Chatbots, code editors, robots, self-driving cars.
- RAG systems themselves are agents (retriever is their tool).

Agent Workflow

Planning:
- Model reasons about task, selects tools, sequences actions.
- Complex tasks → task decomposition (chain of steps).
Execution:
- Calls external tools (APIs, code interpreters, DBs, etc).
- Can be both read (fetch info) and write (make changes).
Compound Mistakes:
- Each step has failure risk; errors compound across multi-step plans.
- Higher stakes than static model use due to automation.

Tools

External tools massively expand agent capabilities (calculation, web search, image generation, code execution, etc.).
Multimodality is enabled by chaining models/tools together.

Planning and Control Flow

Plan Generation:
- Prompt-engineering for planning (step-by-step, chain-of-thought).
- Planning should be decoupled from execution (validate plan before running).
- Can use natural language or function names for plan granularity.
Control Flows:
- Sequential, Parallel, If-statements, Loops.

Reflection and Error Correction

Constantly evaluate at each step:
- After receiving user query, after planning, after each action, after completion.
Use ReAct (Reason + Act + Observe + Reflect) for multi-step agent loops.

Tool Selection

Choose tools based on environment, task, and model strengths.
Too many tools can overwhelm model/context—ablation studies help trim inventory.
Agents can learn new tools (skill library).

4. Memory

Types of Memory in AI Models

Internal Knowledge:
- The information embedded in model weights (doesn't change unless retrained).
Short-term Memory:
- The context window for the model (input tokens, recent conversation).
- Limited, fast access, session-scoped.
Long-term Memory:
- External data sources via retrieval (DBs, files, etc).
- Can be persistent across sessions/tasks.
- Allows retention of user preferences, conversation history, etc.

Benefits of Memory

Handles information overflow beyond context length.
Allows persistence between sessions (personalization).
Boosts consistency (e.g., in subjective ratings).
Can maintain structured data (tables, queues) for complex reasoning.

Memory Management Strategies

FIFO:
- Remove earliest info as new data arrives (common, but can drop critical early context).
Redundancy Removal:
- Summarize, deduplicate, track entities.
Reflection-based Update:
- After each action, reflect on what info to add/replace in memory.

5. Summary and Takeaways

RAG:
- Developed to overcome context window limitations; enables external knowledge integration.
- Retriever quality is crucial; term-based is fast/cheap, embedding-based is semantic but slower/expensive.
Agents:
- Generalize beyond RAG: can perceive, plan, and act (including with write actions).
- Planning, reflection, memory, and tool use are key for complex tasks.
Memory:
- Essential for both RAG and agents—supports handling of large, persistent, or session-spanning information.
Security:
- More automation and tool use → more risks. Defensive mechanisms and oversight (human-in-the-loop) are critical.
Prompt-based Methods:
- RAG and agents enhance quality via inputs, not model modification; future potential in combining with model finetuning.

Practical Points for Builders

When to use RAG?
- Any knowledge-rich task where all data can’t fit in context or changes often.
When to use Agents?
- For tasks requiring multiple steps, tool use, or real-world actions/automation.
Memory Considerations:
- Build memory systems for info overflow, personalization, and consistency.
Evaluate All Components:
- Test retrievers, agents, tools, memory, and the system end-to-end.
Tool/Plan Management:
- Start simple, measure failures, trim or add as required. Plan for parallelism and error correction.

AI Engineering by Chip Huyen: Chapter 7: RAG and Agents

Published by admin on August 3, 2025

Detailed Notes: RAG, Agents, and Memory in AI Applications

1. Task Solving and Context

2. RAG (Retrieval-Augmented Generation)

Definition & Purpose

Key Concepts

RAG Architecture

Retrieval Algorithms

Sparse vs. Dense (Term vs. Embedding)

Comparison Table

Hybrid Search

Evaluation Metrics

Retrieval Optimization Tactics

RAG Beyond Text

3. Agents

Definition

Agent Workflow

Tools

Planning and Control Flow

Reflection and Error Correction

Tool Selection

4. Memory

Types of Memory in AI Models

Benefits of Memory

Memory Management Strategies

5. Summary and Takeaways

Practical Points for Builders

Like this:

0 Comments

What do you think?Cancel reply

RNN vs. CNN vs. Autoencoder vs. Attention/Transformer

AI Engineering by Chip Huyen: Chapter 5 Prompt Engineering

AI Engineering by Chip Huyen: Chapter 2 Notes and summary

AI Engineering by Chip Huyen: Chapter 7: RAG and Agents

Published by admin on August 3, 2025

Detailed Notes: RAG, Agents, and Memory in AI Applications

1. Task Solving and Context

2. RAG (Retrieval-Augmented Generation)

Definition & Purpose

Key Concepts

RAG Architecture

Retrieval Algorithms

Sparse vs. Dense (Term vs. Embedding)

Comparison Table

Hybrid Search

Evaluation Metrics

Retrieval Optimization Tactics

RAG Beyond Text

3. Agents

Definition

Agent Workflow

Tools

Planning and Control Flow

Reflection and Error Correction

Tool Selection

4. Memory

Types of Memory in AI Models

Benefits of Memory

Memory Management Strategies

5. Summary and Takeaways

Practical Points for Builders

Like this:

0 Comments

What do you think?Cancel reply

Related Posts

RNN vs. CNN vs. Autoencoder vs. Attention/Transformer

AI Engineering by Chip Huyen: Chapter 5 Prompt Engineering

AI Engineering by Chip Huyen: Chapter 2 Notes and summary