Telegram Bot 進化：整合 xAI Grok Responses API 與圖片分析

在上一篇文章中，我們用 Bun + Vercel 打造了一個基礎的 AI Telegram Bot。這次，我們要進行重大升級：整合 xAI 的 Grok-4.1，解鎖以下進階功能：

🌐 聯網搜索 (Web Search + X Search)
📷 圖片分析 (多模態理解)
🔗 伺服器端多輪對話 (response_id 追蹤)

為什麼選擇 xAI Grok？

功能	OpenRouter	xAI Grok
聯網搜索	❌ 需額外整合	✅ 原生支援
X (Twitter) 搜索	❌	✅ 即時社群資訊
圖片理解	⚠️ 部分模型	✅ 完整支援
伺服器端對話記憶	❌ 需自行管理	✅ `response_id` 自動追蹤
Context Window	32K-128K	最高 2M tokens

架構設計：雙端點路由

由於 xAI 的 Responses API 不支援多模態輸入，我們採用雙端點策略：

📝 文字訊息 → /v1/responses (Responses API)
   ├─ 聯網搜索 (web_search + x_search)
   └─ 伺服器端對話追蹤 (previous_response_id)

📷 圖片訊息 → /v1/chat/completions (Chat Completions API)
   └─ 多模態圖片分析

步驟 1：設定環境變數

在 Vercel 的 Settings > Environment Variables 加入：

XAI_API_KEY：從 xAI Console 取得
TELEGRAM_TOKEN：已有的 Bot Token

步驟 2：實作雙 API 函式

Responses API (文字 + 聯網搜索)

async function callGrokResponsesAPI(
  input: Array<{role: string, content: string}>,
  enableWebSearch: boolean,
  previousResponseId?: string
): Promise<any> {
  const body: any = {
    model: 'grok-4-1-fast-reasoning',
    input: input,
    store: true,  // 伺服器端保存對話
  };
  
  if (previousResponseId) {
    body.previous_response_id = previousResponseId;
  }
  
  if (enableWebSearch) {
    body.tools = [
      { type: 'web_search' },  // 網路搜索
      { type: 'x_search' }     // X (Twitter) 搜索
    ];
  }

  const res = await fetch('https://api.x.ai/v1/responses', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${XAI_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify(body),
  });

  return res.json();
}

Chat Completions API (圖片分析)

async function callGrokChatAPI(
  messages: Array<{role: string, content: any}>
): Promise<any> {
  const res = await fetch('https://api.x.ai/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${XAI_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'grok-4-1-fast-reasoning',
      messages: messages,
    }),
  });

  return res.json();
}

步驟 3：處理圖片訊息

Telegram 的圖片需要先下載後轉成 base64：

if (hasPhoto) {
  // 取得最大尺寸的圖片
  const photo = message.photo[message.photo.length - 1];
  const caption = message.caption || '請描述這張圖片';
  
  // 1. 取得檔案路徑
  const fileRes = await fetch(
    `https://api.telegram.org/bot${TELEGRAM_TOKEN}/getFile?file_id=${photo.file_id}`
  );
  const fileData = await fileRes.json();
  
  // 2. 下載圖片
  const imageUrl = `https://api.telegram.org/file/bot${TELEGRAM_TOKEN}/${fileData.result.file_path}`;
  const imageRes = await fetch(imageUrl);
  const imageBuffer = await imageRes.arrayBuffer();
  const base64Image = Buffer.from(imageBuffer).toString('base64');
  
  // 3. 組成多模態訊息
  const userContent = [
    {
      type: 'image_url',
      image_url: {
        url: `data:image/jpeg;base64,${base64Image}`,
        detail: 'high'
      }
    },
    { type: 'text', text: caption }
  ];
  
  // 4. 呼叫 Chat Completions API
  const aiData = await callGrokChatAPI([
    { role: 'system', content: '你是友善的助理...' },
    { role: 'user', content: userContent }
  ]);
}

步驟 4：伺服器端多輪對話 (response_id)

這是 Grok 最強大的特色：不需要在客戶端存整個對話歷史。

新舊方式對比

舊：KV 儲存歷史	新：response_id
每次傳完整歷史	只傳 ID
手動 `.slice(-6)` 截斷	xAI 自動管理
Payload 越來越大	恆定大小

實作

// 從 KV 取得上一次的 response_id
const previousResponseId = await kv.get<string>(`grok:${chatId}`);

// 呼叫 API
const aiData = await callGrokResponsesAPI(
  input,
  true, // 啟用聯網搜索
  previousResponseId || undefined
);

// 儲存新的 response_id
if (aiData.id) {
  await kv.set(`grok:${chatId}`, aiData.id, { ex: 86400 }); // 24hr TTL
}

步驟 5：解析 Responses API 回應

Responses API 的回應結構與 Chat Completions 不同：

// Chat Completions: 
aiData.choices?.[0]?.message?.content

// Responses API:
const messageItem = aiData.output?.find(item => item.type === 'message');
const content = messageItem?.content?.[0]?.text || aiData.output_text;

當有 web_search 時，output 陣列會包含搜索結果，真正的訊息在最後一個 type: "message" 項目。

完整路由邏輯

if (hasPhoto) {
  // 圖片 → Chat Completions API
  console.log('📷 Using Chat Completions API for image...');
  aiData = await callGrokChatAPI(input);
  assistantContent = aiData.choices?.[0]?.message?.content;
} else {
  // 文字 → Responses API (帶聯網搜索)
  console.log('🔍 Using Responses API with web search...');
  aiData = await callGrokResponsesAPI(input, true, previousResponseId);
  
  const messageItem = aiData.output?.find(item => item.type === 'message');
  assistantContent = messageItem?.content?.[0]?.text;
  
  // 儲存 response_id
  if (aiData.id) {
    await kv.set(`grok:${chatId}`, aiData.id, { ex: 86400 });
  }
}

踩坑紀錄

1. Responses API 不支援多模態

錯誤：Failed to deserialize... ModelInput

解法：圖片改用 /v1/chat/completions

2. KV 舊資料格式衝突

錯誤：previous_response_id: expected a string

解法：檢查 KV 資料類型

const raw = await kv.get(conversationKey);
const responseId = typeof raw === 'string' ? raw : null;

3. 聯網搜索回應結構不同

回應的 output 是陣列，搜索結果在前，訊息在後。

解法：用 .find(item => item.type === 'message') 定位

總結

我們成功將 Telegram Bot 升級到 xAI Grok：

功能	狀態
🌐 聯網搜索	✅ Web + X Search
📷 圖片分析	✅ 多模態理解
🔗 多輪對話	✅ 伺服器端 response_id
💻 程式碼執行	✅ code_execution
⏰ 對話過期	✅ 24hr TTL

Grok 的 Responses API 設計非常適合建構 Agent：聯網搜索、工具呼叫、程式碼執行、長程記憶都原生支援，大幅簡化了開發複雜度。