Detailed Notes: RAG, Agents, and Memory in AI Applications
1. Task Solving and Context
-
Instructions vs. Context:
- Instructions: General, static for the application (how to solve).
- Context: Specific to each query (what info to use).
- Missing context → AI more likely to hallucinate or err.
-
Two Major Context Patterns:
- RAG (Retrieval-Augmented Generation): Retrieve info from external sources for each query.
- Agents: Use tools (e.g., web search, APIs) to gather/act on information, enabling automation and world interaction.
2. RAG (Retrieval-Augmented Generation)
Definition & Purpose
- Enhances generation by retrieving relevant info from external memory (DBs, previous chats, internet, etc).
- Origin: Coined in Lewis et al., 2020 for knowledge-intensive NLP tasks.
-
Why RAG:
- Model context limits: Only relevant info retrieved for each query.
- Helps with user-specific data, reduces hallucination.
Key Concepts
- Context Construction: Like feature engineering in classical ML; the data you feed is vital.
- Persistent Need: No matter how long model context grows, RAG remains necessary—data always grows faster.
RAG Architecture
-
Two Main Components:
- Retriever: Finds info from external memory (via indexing and querying).
- Generator: Generates response based on retrieved info.
-
Chunking:
- Documents are split into manageable “chunks”.
- Only the most relevant chunks for a query are retrieved.
- Post-processing combines prompt + retrieved data for final input.
Retrieval Algorithms
Sparse vs. Dense (Term vs. Embedding)
-
Sparse (Term-based / Lexical) Retrieval:
-
Data as sparse vectors (one-hot encoding).
-
Fast, strong baselines (BM25, BM25+, BM25F).
-
Sensitive to exact term match, may return irrelevant docs due to ambiguity.
-
Key Metrics:
-
TF (Term Frequency): How often a term appears.
-
IDF (Inverse Document Frequency): Weighting rare/informative terms.
-
Tokenization: Split text, handle n-grams for multi-word terms.
-
-
Dense (Embedding-based / Semantic) Retrieval:
-
Data as dense vectors (embeddings).
-
Vector databases: Store embeddings, allow nearest-neighbor search.
-
More semantic understanding—retrieves documents by meaning.
-
Uses ANN (Approximate Nearest Neighbor) algorithms.
-
Examples: FAISS, ScaNN, Annoy, Hnswlib.
-
Algorithms: LSH, HNSW, Product Quantization, IVF, Annoy.
-
Tradeoff: More expensive, slower, but improves with fine-tuning.
-
Comparison Table
| Term-based | Embedding-based | |
|---|---|---|
| Query Speed | Much faster | Slower |
| Performance | Strong baseline, hard to improve; can misfire on ambiguity | Outperforms with fine-tuning, semantic |
| Cost | Cheap | Expensive (embedding, storage, search) |
Hybrid Search
- Combine both approaches: Use term-based for initial candidate fetch, embedding-based for re-ranking.
Evaluation Metrics
- Context Precision: % of retrieved docs that are relevant.
- Context Recall: % of all relevant docs that are retrieved.
- Other: NDCG, MAP, MRR, MTEB, BEIR.
Retrieval Optimization Tactics
-
Chunking Strategy:
- Chunk size/overlap affects retrieval and downstream model performance.
- Smaller chunks = more variety, but more computational overhead and potential loss of context.
- Chunking by tokens matches model limits, but requires reindexing if you change models.
-
Reranking:
- Further prioritize/reduce candidate docs post-retrieval (e.g., time-based, relevance).
-
Query Rewriting:
- Reformulate ambiguous/elliptical queries for better retrieval (often with a model).
-
Contextual Retrieval:
- Augment chunks with metadata, tags, or short AI-generated contexts to aid future retrieval.
RAG Beyond Text
-
Multimodal RAG:
- Retrieve not just text, but images, audio, video, etc.
- Requires multimodal embedding models (e.g., CLIP).
-
Tabular Data:
- Requires text-to-SQL, table schema prediction, SQL execution, and result synthesis.
3. Agents
Definition
-
Agent: Anything that perceives and acts in an environment, defined by:
- Environment: Where the agent operates.
- Actions/Tools: What the agent can do.
-
Examples:
- Chatbots, code editors, robots, self-driving cars.
- RAG systems themselves are agents (retriever is their tool).
Agent Workflow
-
Planning:
- Model reasons about task, selects tools, sequences actions.
- Complex tasks → task decomposition (chain of steps).
-
Execution:
- Calls external tools (APIs, code interpreters, DBs, etc).
- Can be both read (fetch info) and write (make changes).
-
Compound Mistakes:
- Each step has failure risk; errors compound across multi-step plans.
- Higher stakes than static model use due to automation.
Tools
- External tools massively expand agent capabilities (calculation, web search, image generation, code execution, etc.).
- Multimodality is enabled by chaining models/tools together.
Planning and Control Flow
-
Plan Generation:
- Prompt-engineering for planning (step-by-step, chain-of-thought).
- Planning should be decoupled from execution (validate plan before running).
- Can use natural language or function names for plan granularity.
-
Control Flows:
- Sequential, Parallel, If-statements, Loops.
Reflection and Error Correction
-
Constantly evaluate at each step:
- After receiving user query, after planning, after each action, after completion.
- Use ReAct (Reason + Act + Observe + Reflect) for multi-step agent loops.
Tool Selection
- Choose tools based on environment, task, and model strengths.
- Too many tools can overwhelm model/context—ablation studies help trim inventory.
- Agents can learn new tools (skill library).
4. Memory
Types of Memory in AI Models
-
Internal Knowledge:
- The information embedded in model weights (doesn't change unless retrained).
-
Short-term Memory:
- The context window for the model (input tokens, recent conversation).
- Limited, fast access, session-scoped.
-
Long-term Memory:
- External data sources via retrieval (DBs, files, etc).
- Can be persistent across sessions/tasks.
- Allows retention of user preferences, conversation history, etc.
Benefits of Memory
- Handles information overflow beyond context length.
- Allows persistence between sessions (personalization).
- Boosts consistency (e.g., in subjective ratings).
- Can maintain structured data (tables, queues) for complex reasoning.
Memory Management Strategies
-
FIFO:
- Remove earliest info as new data arrives (common, but can drop critical early context).
-
Redundancy Removal:
- Summarize, deduplicate, track entities.
-
Reflection-based Update:
- After each action, reflect on what info to add/replace in memory.
5. Summary and Takeaways
-
RAG:
- Developed to overcome context window limitations; enables external knowledge integration.
- Retriever quality is crucial; term-based is fast/cheap, embedding-based is semantic but slower/expensive.
-
Agents:
- Generalize beyond RAG: can perceive, plan, and act (including with write actions).
- Planning, reflection, memory, and tool use are key for complex tasks.
-
Memory:
- Essential for both RAG and agents—supports handling of large, persistent, or session-spanning information.
-
Security:
- More automation and tool use → more risks. Defensive mechanisms and oversight (human-in-the-loop) are critical.
-
Prompt-based Methods:
- RAG and agents enhance quality via inputs, not model modification; future potential in combining with model finetuning.
Practical Points for Builders
-
When to use RAG?
- Any knowledge-rich task where all data can’t fit in context or changes often.
-
When to use Agents?
- For tasks requiring multiple steps, tool use, or real-world actions/automation.
-
Memory Considerations:
- Build memory systems for info overflow, personalization, and consistency.
-
Evaluate All Components:
- Test retrievers, agents, tools, memory, and the system end-to-end.
-
Tool/Plan Management:
- Start simple, measure failures, trim or add as required. Plan for parallelism and error correction.
0 Comments