TomorrowBye | WAKAJA

AI Agent 有个致命问题：健忘。

每次重启，之前的对话、知识、上下文全部归零。就像《记忆碎片》里的主角，每 15 分钟就忘记刚才发生的事。

今天我们解决这个问题：用 RAG（检索增强生成）+ 向量数据库，让 Agent 拥有长期记忆。

问题：Context Window 的天花板

Claude 3.5 Sonnet 的上下文窗口是 200K tokens，看起来很大？但实际使用中：

一本小说 = 100K tokens
一年的对话记录 = 500K+ tokens
一个中型项目的文档 = 1M+ tokens

更致命的是，Context Window 越大，成本越高，响应越慢。

// ❌ 糟糕的做法：把所有历史塞进 Context
const messages = [
  ...last1000Messages, // 塞爆 Context Window
  { role: "user", content: query },
]
 
const response = await generateText({ messages })
// 结果：慢、贵、还可能超出 token 限制

我们需要更聪明的方式。

解决方案：RAG（检索增强生成）

RAG 的核心思想：不是把所有知识塞进 Context，而是只检索相关的知识。

RAG 工作流程

用户提问 → 转换为向量 → 在向量数据库中搜索 → 找到最相关的内容 → 注入 Context → 生成答案

关键优势：

无限知识容量（数据库可以无限大）
低成本（只传输相关内容）
快速响应（向量搜索比全文扫描快 100 倍）

实战：构建知识库 Agent

1. 选择向量数据库

# Pinecone - 云原生，开箱即用
npm install @pinecone-database/pinecone
 
# Weaviate - 开源，功能强大
npm install weaviate-ts-client
 
# Qdrant - 高性能，支持本地部署
npm install @qdrant/qdrant-js

我们用 Pinecone（最简单）。

2. 将知识转换为向量

import { embed } from "ai"
import { openai } from "@ai-sdk/openai"
 
// 将文本转换为向量
async function createEmbedding(text: string) {
  const { embedding } = await embed({
    model: openai.embedding("text-embedding-3-small"),
    value: text,
  })
 
  return embedding // 返回 1536 维向量
}
 
// 示例：存储项目文档
const docs = [
  { id: "1", text: "Agent 基础设施包括沙箱、持久化执行、多模态能力..." },
  { id: "2", text: "Durable Execution 通过事件溯源保证任务不丢失..." },
  { id: "3", text: "Tool Calling 让 Agent 能调用外部 API 和工具..." },
]
 
for (const doc of docs) {
  const vector = await createEmbedding(doc.text)
 
  await pinecone.upsert([
    {
      id: doc.id,
      values: vector,
      metadata: { text: doc.text },
    },
  ])
}

3. 检索相关知识

async function searchKnowledge(query: string) {
  // 1. 将问题转换为向量
  const queryVector = await createEmbedding(query)
 
  // 2. 在向量数据库中搜索最相似的内容
  const results = await pinecone.query({
    vector: queryVector,
    topK: 3, // 返回最相关的 3 条结果
    includeMetadata: true,
  })
 
  // 3. 提取文本内容
  return results.matches.map((match) => match.metadata.text)
}
 
// 示例
const knowledge = await searchKnowledge("什么是 Durable Execution？")
// 返回：['Durable Execution 通过事件溯源保证任务不丢失...', ...]

4. 结合 LLM 生成答案

import { generateText } from "ai"
import { anthropic } from "@ai-sdk/anthropic"
 
async function answerQuestion(question: string) {
  // 1. 检索相关知识
  const relevantDocs = await searchKnowledge(question)
 
  // 2. 构建增强的 Prompt
  const context = relevantDocs.join("\n\n---\n\n")
 
  const prompt = `
你是一个 AI Agent 专家。请基于以下知识库回答问题：
 
## 知识库
${context}
 
## 用户问题
${question}
 
请基于知识库内容回答，如果知识库中没有相关信息，请诚实说明。
  `
 
  // 3. 生成答案
  const { text } = await generateText({
    model: anthropic("claude-3-5-sonnet-20241022"),
    prompt,
  })
 
  return text
}
 
// 使用
const answer = await answerQuestion("Durable Execution 的核心原理是什么？")
console.log(answer)

进阶：多轮对话记忆

RAG 不仅能存储文档，还能存储对话历史。

async function storeConversation(
  userId: string,
  message: string,
  response: string,
) {
  const conversationText = `用户：${message}\nAgent：${response}`
  const vector = await createEmbedding(conversationText)
 
  await pinecone.upsert([
    {
      id: `${userId}-${Date.now()}`,
      values: vector,
      metadata: {
        userId,
        message,
        response,
        timestamp: Date.now(),
      },
    },
  ])
}
 
async function searchHistory(userId: string, query: string) {
  const queryVector = await createEmbedding(query)
 
  const results = await pinecone.query({
    vector: queryVector,
    filter: { userId }, // 只搜索该用户的历史
    topK: 5,
  })
 
  return results.matches.map((m) => m.metadata)
}
 
// 使用
await storeConversation("user123", "什么是 RAG？", "RAG 是检索增强生成...")
 
// 稍后查询
const history = await searchHistory("user123", "之前我们讨论过什么？")
console.log(history)

成本对比

以 10 万条对话记录为例：

方案	每次查询成本	响应延迟
全部塞进 Context	$2.50	10-15s
RAG (3条相关)	$0.015	1-2s

RAG 降低成本 166 倍！

最佳实践

1. 分块策略

function chunkText(text: string, maxLength: 500) {
  const sentences = text.split(/[.!?]\s+/)
  const chunks = []
  let currentChunk = ""
 
  for (const sentence of sentences) {
    if (currentChunk.length + sentence.length > maxLength) {
      chunks.push(currentChunk.trim())
      currentChunk = sentence
    } else {
      currentChunk += " " + sentence
    }
  }
 
  if (currentChunk) chunks.push(currentChunk.trim())
  return chunks
}

为什么分块？

文档太长，向量无法准确表达语义
小块更精确，检索更准确
推荐：200-500 字符一块

2. 混合搜索

async function hybridSearch(query: string) {
  // 向量搜索（语义相似）
  const semanticResults = await vectorSearch(query)
 
  // 关键词搜索（精确匹配）
  const keywordResults = await fullTextSearch(query)
 
  // 合并去重
  return mergeResults(semanticResults, keywordResults)
}

3. 定期清理过期数据

async function cleanupOldConversations(daysToKeep: number) {
  const cutoff = Date.now() - daysToKeep * 24 * 60 * 60 * 1000
 
  await pinecone.delete({
    filter: {
      timestamp: { $lt: cutoff },
    },
  })
}
 
// 每周清理 30 天前的对话
setInterval(() => cleanupOldConversations(30), 7 * 24 * 60 * 60 * 1000)

下一步

长期记忆解决了 Agent 的健忘问题，但还有个关键能力缺失：理解和操作用户界面。

明天我们聊 Agent 的视觉能力：如何让 Agent 看懂网页、操作 UI，真正实现自主任务执行。

敬请期待 👀

系列文章

第 1 篇：为什么 Agent 需要独立的基础设施？
第 2 篇：安全是第一要务
第 3 篇：Durable Execution - Agent 的可靠性保障
第 4 篇：多模态能力 - 突破纯文本限制
第 5 篇：Vercel AI SDK 实战 - 从零构建图像文档助手
第 6 篇：Tool Calling 实战 - 让 Agent 自主调用外部工具
第 7 篇：Agent 的长期记忆 - RAG 与向量数据库实战（本文）