Detailed Notes: RAG, Agents, and Memory in AI Applications


1. Task Solving and Context

  • Instructions vs. Context:

    • Instructions: General, static for the application (how to solve).
    • Context: Specific to each query (what info to use).
    • Missing context → AI more likely to hallucinate or err.
  • Two Major Context Patterns:

    • RAG (Retrieval-Augmented Generation): Retrieve info from external sources for each query.
    • Agents: Use tools (e.g., web search, APIs) to gather/act on information, enabling automation and world interaction.

2. RAG (Retrieval-Augmented Generation)

Definition & Purpose

  • Enhances generation by retrieving relevant info from external memory (DBs, previous chats, internet, etc).
  • Origin: Coined in Lewis et al., 2020 for knowledge-intensive NLP tasks.
  • Why RAG:

    • Model context limits: Only relevant info retrieved for each query.
    • Helps with user-specific data, reduces hallucination.

Key Concepts

  • Context Construction: Like feature engineering in classical ML; the data you feed is vital.
  • Persistent Need: No matter how long model context grows, RAG remains necessary—data always grows faster.

RAG Architecture

  • Two Main Components:

    1. Retriever: Finds info from external memory (via indexing and querying).
    2. Generator: Generates response based on retrieved info.
  • Chunking:

    • Documents are split into manageable “chunks”.
    • Only the most relevant chunks for a query are retrieved.
    • Post-processing combines prompt + retrieved data for final input.

Retrieval Algorithms

Sparse vs. Dense (Term vs. Embedding)

  • Sparse (Term-based / Lexical) Retrieval:

    • Data as sparse vectors (one-hot encoding).

    • Fast, strong baselines (BM25, BM25+, BM25F).

    • Sensitive to exact term match, may return irrelevant docs due to ambiguity.

    • Key Metrics:

    • TF (Term Frequency): How often a term appears.

    • IDF (Inverse Document Frequency): Weighting rare/informative terms.

    • Tokenization: Split text, handle n-grams for multi-word terms.

  • Dense (Embedding-based / Semantic) Retrieval:

    • Data as dense vectors (embeddings).

    • Vector databases: Store embeddings, allow nearest-neighbor search.

    • More semantic understanding—retrieves documents by meaning.

    • Uses ANN (Approximate Nearest Neighbor) algorithms.

    • Examples: FAISS, ScaNN, Annoy, Hnswlib.

    • Algorithms: LSH, HNSW, Product Quantization, IVF, Annoy.

    • Tradeoff: More expensive, slower, but improves with fine-tuning.

Comparison Table

Term-based Embedding-based
Query Speed Much faster Slower
Performance Strong baseline, hard to improve; can misfire on ambiguity Outperforms with fine-tuning, semantic
Cost Cheap Expensive (embedding, storage, search)

Hybrid Search

  • Combine both approaches: Use term-based for initial candidate fetch, embedding-based for re-ranking.

Evaluation Metrics

  • Context Precision: % of retrieved docs that are relevant.
  • Context Recall: % of all relevant docs that are retrieved.
  • Other: NDCG, MAP, MRR, MTEB, BEIR.

Retrieval Optimization Tactics

  • Chunking Strategy:

    • Chunk size/overlap affects retrieval and downstream model performance.
    • Smaller chunks = more variety, but more computational overhead and potential loss of context.
    • Chunking by tokens matches model limits, but requires reindexing if you change models.
  • Reranking:

    • Further prioritize/reduce candidate docs post-retrieval (e.g., time-based, relevance).
  • Query Rewriting:

    • Reformulate ambiguous/elliptical queries for better retrieval (often with a model).
  • Contextual Retrieval:

    • Augment chunks with metadata, tags, or short AI-generated contexts to aid future retrieval.

RAG Beyond Text

  • Multimodal RAG:

    • Retrieve not just text, but images, audio, video, etc.
    • Requires multimodal embedding models (e.g., CLIP).
  • Tabular Data:

    • Requires text-to-SQL, table schema prediction, SQL execution, and result synthesis.

3. Agents

Definition

  • Agent: Anything that perceives and acts in an environment, defined by:

    • Environment: Where the agent operates.
    • Actions/Tools: What the agent can do.
  • Examples:

    • Chatbots, code editors, robots, self-driving cars.
    • RAG systems themselves are agents (retriever is their tool).

Agent Workflow

  • Planning:

    • Model reasons about task, selects tools, sequences actions.
    • Complex tasks → task decomposition (chain of steps).
  • Execution:

    • Calls external tools (APIs, code interpreters, DBs, etc).
    • Can be both read (fetch info) and write (make changes).
  • Compound Mistakes:

    • Each step has failure risk; errors compound across multi-step plans.
    • Higher stakes than static model use due to automation.

Tools

  • External tools massively expand agent capabilities (calculation, web search, image generation, code execution, etc.).
  • Multimodality is enabled by chaining models/tools together.

Planning and Control Flow

  • Plan Generation:

    • Prompt-engineering for planning (step-by-step, chain-of-thought).
    • Planning should be decoupled from execution (validate plan before running).
    • Can use natural language or function names for plan granularity.
  • Control Flows:

    • Sequential, Parallel, If-statements, Loops.

Reflection and Error Correction

  • Constantly evaluate at each step:

    • After receiving user query, after planning, after each action, after completion.
  • Use ReAct (Reason + Act + Observe + Reflect) for multi-step agent loops.

Tool Selection

  • Choose tools based on environment, task, and model strengths.
  • Too many tools can overwhelm model/context—ablation studies help trim inventory.
  • Agents can learn new tools (skill library).

4. Memory

Types of Memory in AI Models

  • Internal Knowledge:

    • The information embedded in model weights (doesn't change unless retrained).
  • Short-term Memory:

    • The context window for the model (input tokens, recent conversation).
    • Limited, fast access, session-scoped.
  • Long-term Memory:

    • External data sources via retrieval (DBs, files, etc).
    • Can be persistent across sessions/tasks.
    • Allows retention of user preferences, conversation history, etc.

Benefits of Memory

  • Handles information overflow beyond context length.
  • Allows persistence between sessions (personalization).
  • Boosts consistency (e.g., in subjective ratings).
  • Can maintain structured data (tables, queues) for complex reasoning.

Memory Management Strategies

  • FIFO:

    • Remove earliest info as new data arrives (common, but can drop critical early context).
  • Redundancy Removal:

    • Summarize, deduplicate, track entities.
  • Reflection-based Update:

    • After each action, reflect on what info to add/replace in memory.

5. Summary and Takeaways

  • RAG:

    • Developed to overcome context window limitations; enables external knowledge integration.
    • Retriever quality is crucial; term-based is fast/cheap, embedding-based is semantic but slower/expensive.
  • Agents:

    • Generalize beyond RAG: can perceive, plan, and act (including with write actions).
    • Planning, reflection, memory, and tool use are key for complex tasks.
  • Memory:

    • Essential for both RAG and agents—supports handling of large, persistent, or session-spanning information.
  • Security:

    • More automation and tool use → more risks. Defensive mechanisms and oversight (human-in-the-loop) are critical.
  • Prompt-based Methods:

    • RAG and agents enhance quality via inputs, not model modification; future potential in combining with model finetuning.

Practical Points for Builders

  • When to use RAG?

    • Any knowledge-rich task where all data can’t fit in context or changes often.
  • When to use Agents?

    • For tasks requiring multiple steps, tool use, or real-world actions/automation.
  • Memory Considerations:

    • Build memory systems for info overflow, personalization, and consistency.
  • Evaluate All Components:

    • Test retrievers, agents, tools, memory, and the system end-to-end.
  • Tool/Plan Management:

    • Start simple, measure failures, trim or add as required. Plan for parallelism and error correction.
Categories: AI

0 Comments

What do you think?