构建生产级可靠AI Agent 的12条原则

受经典的 12-Factor App 启发，Dexter Horthy总结了构建可靠、可扩展大模型（LLM）Agent 的 12 条工程原则。本文呈现核心信息。

前言：Agent 的本质

Agent 是能自主完成任务的 AI 程序。

传统程序需要你告诉它每一步怎么做。比如"先查数据库，再发邮件，最后记日志"。

Agent则是，你告诉它目标，它自己决定怎么做。比如"帮我处理这个客户投诉"，它会自己判断要查订单、要不要退款、要不要找人审批。

比如，数据分析 Agent，给它数据，它自己决定用什么方法分析。代码 Agent，描述需求，它自己写代码、测试、修 bug。

优秀的 Agent 并非"提供提示词 + 工具集 + 循环执行"的简单模式，而是主要由传统软件构成，仅在需要概率推理的关键节点使用 LLM。

市面上多数"AI Agent"产品实际并不那么智能体化，它们主要是确定性代码，只在恰当位置引入 LLM 来创造良好体验。

生产环境中很少看到 LangChain、LangGraph 等框架被采用，团队通常自行构建方案。

这就面临典型困境：选择框架快速开发 → 达到 70-80% 质量 → 发现质量不足以面向客户 → 需要深入框架源码优化 → 最终推倒重写。

原则 1：让 AI 把人话转成结构化指令

用户说"订明天下午 3 点的会议室"，AI 要输出 {"action": "book_room", "time": "2024-12-02 15:00", "duration": 60} 这样的结构化数据。

const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{role: "user", content: "订明天下午3点会议室"}],
  tools: [{
    type: "function",
    function: {
      name: "book_room",
      parameters: {
        type: "object",
        properties: {
          time: {type: "string"},
          duration: {type: "number"}
        }
      }
    }
  }]
});

LLM 返回应该调用哪个工具及其参数，你的代码接收这个 JSON 后执行实际的业务逻辑。

原则 2：自己写提示词，别用框架的

提示词要明确写在你的代码里，不要让框架藏起来。

// 不推荐的做法
const result = await framework.chat(userInput);

// 推荐的做法
const prompt = `你是客服助手。
规则：
- 保持友好礼貌
- 不得泄露用户隐私
- 遇到不确定的问题回复"让我为您查询"

用户问题：${userInput}`;

const result = await openai.chat(prompt);

将提示词存放在独立文件（如 prompts/customer-service.txt），使用版本控制管理，重要改动需要团队 review。这样便于 A/B 测试、迭代优化，也避免了框架抽象带来的黑盒问题。

原则 3：自己决定给 AI 看什么

你要控制给 AI 看哪些信息、怎么组织这些信息，不要让框架自动塞一堆东西。

// 不推荐：无差别投入所有历史
const context = allMessages; // 浪费 token，信息噪音大

// 推荐：精选相关信息
const context = [
  systemPrompt,
  ...getRecentMessages(5), // 最近 5 条消息
  ...getRelevantHistory(userQuery, 3) // 与当前查询相关的 3 条历史
];

优化上下文窗口包括：提高信息密度（用更少 token 表达同样内容）、灵活选择格式（XML、JSON、Markdown 等）、过滤无关信息。比如用户询问订单状态时，只需要提供最近的订单信息和当前问题上下文，无需包含一个月前的闲聊记录。

原则 4：工具调用就是 AI 输出 JSON

AI 的"工具调用"其实就是输出一段 JSON 告诉你该做什么，真的要不要做、怎么做，你的代码说了算。

const toolCall = await ai.getNextAction(context);
// toolCall = {tool: "send_email", to: "boss@company.com", subject: "..."}

// 添加业务逻辑控制
if (toolCall.tool === "send_email") {
  if (needsApproval(toolCall.to)) {
    // 敏感收件人需要审批
    await requestApproval(toolCall);
    return; // 暂停等待审批
  }
  
  if (isSpam(toolCall.subject)) {
    // 内容检查
    return {error: "邮件内容不当"};
  }
  
  // 通过检查后执行
  await sendEmail(toolCall);
}

工具不仅是"纯函数"，可以是复杂操作的抽象，可以是异步任务，可以是需要人工审批的操作。

原则 5：状态信息不要分两个地方存

不要数据库存一套状态，对话记录又是另一套。尽量从对话历史就能看出 Agent 在做什么。

// 不推荐：维护两套状态
database.save({status: "waiting_approval", step: 3});
messages.push({role: "assistant", content: "已发送请求"});
// 两边信息可能不一致

// 推荐：从消息历史推断状态
messages.push({
  role: "assistant", 
  content: "已向张经理发送审批请求，等待回复",
  metadata: {
    action: "request_approval",
    approver: "zhang",
    timestamp: Date.now()
  }
});
// 从消息历史即可判断当前状态

这简化了状态管理、支持工作流分支和回溯、提高了系统可观测性。

原则 6：Agent 能暂停、能继续运行

Agent 要能在需要等待时（等人审批、等 API 响应）优雅地停下来，拿到结果后接着跑。

class Agent {
  async run(task) {
    this.state = {messages: [], status: "running"};
    
    while (this.state.status === "running") {
      const action = await this.ai.getNextAction(this.state.messages);
      
      if (action.type === "request_approval") {
        this.state.status = "paused";
        this.state.waitingFor = "approval";
        await this.sendApprovalRequest(action);
        return; // 暂停执行
      }
      
      const result = await this.execute(action);
      this.state.messages.push(result);
    }
  }
  
  async resume(input) {
    this.state.messages.push({role: "user", content: input});
    this.state.status = "running";
    return this.run(); // 继续执行
  }
}

// 使用示例
const agent = new Agent();
await agent.run("发邮件给老板");
// ... Agent 暂停等待审批
await agent.resume("同意"); // 带着审批结果继续

这对于长时间运行的任务、需要人工介入的场景、调试和测试都极为重要。

原则 7：找人帮忙也是一种工具

把"联系人类"当成普通工具来定义和使用，就像调用"发邮件"、"查数据库"一样。

const tools = [
  {
    name: "query_database",
    description: "查询数据库"
  },
  {
    name: "ask_human",
    description: "向人类专家咨询不确定的问题",
    parameters: {
      question: {type: "string", description: "要咨询的问题"},
      urgency: {type: "string", enum: ["low", "medium", "high"]}
    }
  }
];

// LLM 可能返回
const action = {
  tool: "ask_human",
  question: "此退款金额是否合理？",
  urgency: "high"
};

// 执行逻辑
async function executeAskHuman(params) {
  await slack.sendMessage("#support", 
    `🤖 AI 需要人工协助：${params.question}\n紧急程度：${params.urgency}`
  );
  // 暂停 Agent 等待人类响应
}

这统一了人机协作的处理方式，支持多种通知渠道（Slack、Email、Web 界面等），使人工介入成为工作流的自然组成部分。

原则 8：关键流程自己写代码控制

不要让 AI 完全控制程序怎么跑。重要的逻辑（审批、重试、安全检查）要你自己写代码保证。

// 不推荐：完全依赖 LLM 判断
while (true) {
  const action = await ai.getNextAction();
  await execute(action);
  if (action.type === "done") break;
}

// 推荐：插入控制逻辑
let retryCount = 0;
while (true) {
  const action = await ai.getNextAction();
  
  // 审批控制
  if (action.requiresApproval) {
    await requestApproval(action);
    break; // 暂停等待
  }
  
  const result = await execute(action);
  
  // 错误处理
  if (result.error) {
    retryCount++;
    if (retryCount > 3) {
      await notifyHuman("连续失败 3 次，需要人工介入");
      break; // 避免无限重试
    }
  }
  
  if (action.type === "done") break;
}

实现复杂控制逻辑、错误处理、安全检查、重试策略等，不应依赖 LLM 的概率性判断，而应由确定性代码保证。

原则 9：把错误信息压缩后再给 AI

错误信息要简洁有用，不要把 500 行的错误堆栈直接扔给 AI。

try {
  await stripe.createCharge(amount);
} catch (error) {
  // 不推荐：直接放入完整错误
  context.push({error: error.stack}); // 可能有 500 行
  
  // 推荐：提取关键信息
  context.push({
    error: `Stripe 支付失败：${error.message}`,
    code: error.code,
    suggestion: "可能是信用卡余额不足或已过期"
  });
}

提取的错误信息应包括：哪个服务出错、错误类型、可能的原因、建议的解决方式。这让 LLM 能理解问题并采取恰当的处理策略。

原则 10：一个 Agent 只做一件事

不要做万能 Agent，一个 Agent 专注一类任务。建议不超过 20 个工具、不超过 100 个步骤。

// 不推荐：万能 Agent
class SuperAgent {
  tools = [
    "查订单", "处理退款", "修改地址", "客户咨询", 
    "数据分析", "发送邮件", "库存管理", ...
  ] // 30+ 个工具，职责模糊
}

// 推荐：专注的小 Agent
class OrderQueryAgent {
  tools = ["查询订单", "查询物流"];
}

class RefundAgent {
  tools = ["计算退款金额", "发起退款", "通知用户"];
}

class CustomerSupportAgent {
  tools = ["回答常见问题", "转接人工"];
  
  async handle(question) {
    if (isOrderRelated(question)) {
      return orderQueryAgent.handle(question);
    }
    if (isRefundRelated(question)) {
      return refundAgent.handle(question);
    }
    // 处理一般性咨询
  }
}

复杂任务通过多个小 Agent 协作完成。即使未来 LLM 能力大幅提升，专注的 Agent 仍然更易维护、测试和调试。

原则 11：让用户在任何地方都能用 Agent

Agent 要能在 Slack、网页、命令行、API 等各种地方使用，用户在哪里工作就在哪里提供服务。

// 核心业务逻辑
class TaskAgent {
  async createTask(description) {
    return await database.createTask(description);
  }
}

// Slack 入口
slackBot.command("/create-task", async (req) => {
  const result = await taskAgent.createTask(req.text);
  return `任务已创建：${result.id}`;
});

// Web API 入口
app.post("/api/tasks", async (req, res) => {
  const result = await taskAgent.createTask(req.body.description);
  res.json(result);
});

// 命令行入口
program.command("create <description>")
  .action(async (description) => {
    const result = await taskAgent.createTask(description);
    console.log(`任务已创建：${result.id}`);
  });

统一的后端逻辑配合多样的前端入口，满足不同用户的使用习惯。

原则 12：Agent 像纯函数一样工作

把 Agent 设计成：输入当前状态和新事件，输出新状态。不要在 Agent 里存变量。

// 不推荐：内部状态
class Agent {
  private currentStep = 0;
  private history = [];
  
  async run(input) {
    this.currentStep++;
    this.history.push(input);
    // 难以调试和重放
  }
}

// 推荐：无状态 Reducer
function agentReducer(state, event) {
  const newMessages = [...state.messages, event];
  const action = calculateNextAction(newMessages);
  
  return {
    ...state,
    messages: newMessages,
    nextAction: action
  };
}

// 使用方式
let state = {messages: [], nextAction: null};
state = agentReducer(state, {role: "user", content: "你好"});
state = agentReducer(state, {role: "assistant", content: "您好！"});

// 优势：可以保存和恢复任意历史状态
const savedState = JSON.parse(localStorage.get("state"));
state = agentReducer(savedState, newEvent);

这种模式便于测试（输入输出明确）、调试（可重放任意时刻）、分布式部署（无状态服务易于扩展）。类似 Redux 的时间旅行调试，能够回到任何历史状态继续执行。

实践建议

渐进式采用

无需一次实现全部 12 条原则。根据当前挑战选择相关原则：

提示词难以优化？应用原则 2（自主管理提示词）和原则 3（控制上下文窗口）

需要人工审批？实现原则 7（人类交互）和原则 6（暂停/恢复）

调试困难？采用原则 5（统一状态）和原则 12（无状态 Reducer）

可靠性问题？使用原则 10（小而专注）和原则 8（控制流程）

从小功能开始

选择一个独立的小功能验证原则，例如：

客服自动回复：常见问题自动回答，不确定的转人工（原则 7）

订单查询助手：接收订单号，调用 query_order 工具返回结果（原则 1、4）

审批工作流：检查申请格式，自动分配审批人，等待审批后继续（原则 6、7、8）

验证效果后再扩展到其他功能。

工程基础设施

// 日志：记录 LLM 决策
logger.info("LLM 决策", {action, context});

// 监控：跟踪成功率
metrics.increment("agent.success");
metrics.increment("agent.failure");

// 测试：保证改动不破坏功能
test("应该识别退款请求", async () => {
  const action = await ai.getNextAction([
    {role: "user", content: "我要退款"}
  ]);
  expect(action.tool).toBe("process_refund");
});

像对待传统软件一样，实施版本控制、单元测试、集成测试、可观测性监控等工程实践。

核心要点

Agent 本质上是软件，主要由确定性代码构成，LLM 只在需要概率推理的节点介入
控制关键接口：提示词、上下文窗口、控制流应该由你掌控，而非框架黑盒
工程纪律优先：应用传统软件工程的最佳实践，大模型只是众多组件之一
可靠性胜过智能性：生产环境中，可靠的 Agent 比看起来"更智能"的 Agent 更有价值

即使大模型能力持续提升，这些工程原则仍将保持价值。它们专注于使 LLM 应用更可靠、可扩展、易于维护。

延伸阅读完整文档：https://www.humanlayer.dev/12-factor-agents

GitHub 仓库：https://github.com/humanlayer/12-factor-agents