從 LangChain 遷移到 LlamaIndex：PropertyGraphIndex 整合實戰

當 LLM 編排庫越來越多，選擇正確的工具變得至關重要。本文分享將 career-kb 專案從 LangChain 遷移到 LlamaIndex 的完整過程，以及如何將 LlamaIndex 的 PropertyGraphIndex 與現有的 Rust petgraph 高效能後端整合。

為什麼遷移？

career-kb 原本使用 LangChain 進行 LLM 編排，但隨著專案發展，我們發現幾個痛點：

痛點	LangChain 現況	LlamaIndex 優勢
GraphRAG	需要額外的 LangGraph	原生 PropertyGraphIndex
程式碼重複	每個檔案都有 `get_langchain_model()`	統一 Provider 模組
Structured Output	`with_structured_output().invoke()`	`structured_predict()` 更簡潔
資料導向	流程編排為主	RAG/資料處理為主

架構概覽

遷移後的架構：

graph TB
    subgraph LlamaIndex["LlamaIndex 編排層"]
        PGI["PropertyGraphIndex"]
        SP["structured_predict"]
    end
    
    subgraph Adapter["Adapter 層"]
        PPGS["PetgraphPropertyGraphStore"]
    end
    
    subgraph Backend["高效能後端"]
        PG["petgraph (Rust)
~10-100x 更快"]
        LDB["LanceDB (Arrow)
向量 + FTS 存儲"]
    end
    
    PGI --> PPGS
    SP --> PPGS
    PPGS --> PG
    PPGS --> LDB
    PG <--> LDB

Phase 1: 依賴更新

pyproject.toml

-    "langchain>=1.0",
-    "langchain-openai>=0.3",
-    "langchain-google-genai>=2.0",
+    "llama-index>=0.14",
+    "llama-index-llms-openai>=0.5",
+    "llama-index-llms-google-genai>=0.4",

Phase 2: 統一 LLM Provider

之前每個檔案都有類似的重複程式碼：

# nli.py, reflection.py, entity_extractor.py 都有這段
def get_langchain_model():
    if os.getenv("XAI_API_KEY"):
        from langchain_openai import ChatOpenAI
        return ChatOpenAI(...)
    elif os.getenv("GOOGLE_API_KEY"):
        from langchain_google_genai import ChatGoogleGenerativeAI
        return ChatGoogleGenerativeAI(...)
    # ... 20+ 行

現在統一到 llm/provider.py：

"""Unified LLM provider for career-kb using LlamaIndex."""

from llama_index.llms.openai import OpenAI
from llama_index.llms.google_genai import GoogleGenAI
import os

def get_llm(temperature: float = 0.1):
    """Get LlamaIndex LLM with xAI/Google/OpenAI fallback."""
    if os.getenv("XAI_API_KEY"):
        return OpenAI(
            api_key=os.getenv("XAI_API_KEY"),
            api_base=os.getenv("XAI_BASE_URL", "https://api.x.ai/v1"),
            model=os.getenv("XAI_MODEL", "grok-4-1-fast-reasoning"),
            temperature=temperature,
        )
    elif os.getenv("GOOGLE_API_KEY"):
        return GoogleGenAI(
            api_key=os.getenv("GOOGLE_API_KEY"),
            model=os.getenv("GOOGLE_MODEL", "gemini-3-flash-preview"),
            temperature=temperature,
        )
    else:
        return OpenAI(
            api_key=os.getenv("OPENAI_API_KEY"),
            model=os.getenv("OPENAI_MODEL", "gpt-5-mini"),
            temperature=temperature,
        )

所有其他檔案只需一行 import：

from career_kb.llm.provider import get_llm

Phase 3: Structured Output 遷移

LangChain 寫法 (Before)

llm = get_langchain_model()
structured_llm = llm.with_structured_output(NLIVerificationResult)
prompt = NLI_PROMPT_TEMPLATE.format(skill=skill, evidence=evidence_str)
result = structured_llm.invoke(prompt)

LlamaIndex 寫法 (After)

from llama_index.core.prompts import PromptTemplate

llm = get_llm()
prompt = PromptTemplate(NLI_PROMPT_TEMPLATE)

result = llm.structured_predict(
    NLIVerificationResult,
    prompt,
    skill=skill,
    evidence=evidence_str,
)

關鍵差異：

不需要額外建立 structured_llm 物件
變數直接傳入 structured_predict()
更符合函數式風格

Phase 4: PropertyGraphIndex 整合

這是最有價值的部分：將 LlamaIndex 的 PropertyGraphIndex 與 Rust petgraph 整合。

設計理念

不是取代 petgraph，而是在上面加一層編排：

# LlamaIndex 負責：Query 編排、LLM 呼叫、Retriever 組合
# Petgraph 負責：高效能圖運算、BFS/DFS、Leiden 算法

PetgraphPropertyGraphStore

實作 LlamaIndex 的 PropertyGraphStore 介面：

from llama_index.core.graph_stores.types import PropertyGraphStore, EntityNode, Relation, Triplet

class PetgraphPropertyGraphStore(PropertyGraphStore):
    """PropertyGraphStore backed by petgraph (Rust)."""
    
    def __init__(self, knowledge_graph: CareerKnowledgeGraph | None = None):
        self._kg = knowledge_graph or CareerKnowledgeGraph()
        self._kg.load_from_skill_graph()
    
    def get_triplets(self, entity_names=None, relation_names=None, **kwargs) -> list[Triplet]:
        """Get triplets from petgraph."""
        triplets = []
        for source, target, relation in self._get_all_edges():
            if entity_names and source not in entity_names and target not in entity_names:
                continue
            if relation_names and relation not in relation_names:
                continue
            
            triplets.append((
                EntityNode(name=source, label="entity"),
                Relation(source_id=source, target_id=target, label=relation),
                EntityNode(name=target, label="entity"),
            ))
        return triplets
    
    def get_rel_map(self, graph_nodes, depth=2, **kwargs) -> dict:
        """Leverage petgraph's efficient BFS traversal."""
        rel_map = {}
        for node in graph_nodes:
            neighbors = self._kg.get_neighbors(node.name, hops=depth)
            # ... build paths
        return rel_map

使用範例

from career_kb.graph.property_graph_index import create_property_graph_index

# 建立 index（底層用 petgraph）
index = create_property_graph_index(use_llm_extractor=False)

# 查詢
retriever = index.as_retriever()
nodes = retriever.retrieve("Python backend experience")

Phase 5: CLI 整合

新增兩個 CLI 命令：

# 顯示 PropertyGraphStore 資訊
career-kb graph property-store

# 使用 PropertyGraphIndex 查詢
career-kb graph property-query "Python backend"

執行結果

$ career-kb graph property-store

        PropertyGraphStore Info        
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ Metric         ┃ Value              ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ Backend        │ rust (petgraph)    │
│ Node Count     │ 93                 │
│ Edge Count     │ 217                │
│ Node Types     │ skill              │
│ Relation Types │ implies, relatedTo │
└────────────────┴────────────────────┘

Sample Triplets:
  (Python) --[relatedTo]--> (Data Science)
  (LangChain) --[implies]--> (Python)

遷移的模組清單

模組	變更
`llm/provider.py`	[NEW] 統一的 LLM provider
`llm/nli.py`	`get_langchain_model()` → `get_llm()`
`llm/reflection.py`	Writer/Critic/Reviser 改用 `structured_predict`
`graph/entity_extractor.py`	移除重複的 LLM 初始化
`graph/evidence_selection.py`	`ChatPromptTemplate` → `PromptTemplate`
`graph/property_graph_store.py`	[NEW] PropertyGraphStore adapter
`graph/property_graph_index.py`	[NEW] PropertyGraphIndex 整合

效能驗證

LLM 呼叫

$ career-kb verify -s "Python,FastAPI" --enhanced-nli

              Skill Verification Results               
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ 技能    ┃ 狀態    ┃ 嚴重性 ┃ 證據                   ┃ 來源  ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ Python  │ COVERED │   ✅   │ Developed Chat-Native  │ LLM+1 │
│ FastAPI │ COVERED │   ✅   │ Positioned Python as   │ LLM+1 │
└─────────┴─────────┴────────┴────────────────────────┴───────┘

圖運算

# Rust petgraph 效能
g.get_neighbors("Python", hops=2)  # 20 nodes, <1ms

未來擴展

Arrow 零拷貝整合

目前 petgraph binding 使用標準 PyO3 型別轉換：

pub fn get_all_nodes(&self) -> Vec<(String, String)> {
    // 每次都會複製數據
}

未來可以用 Arrow 實現零拷貝：

pub fn get_all_nodes_arrow(&self) -> PyResult<PyObject> {
    // 返回 Arrow RecordBatch，Python 端直接讀取
}

LLM Entity Extraction

目前手動呼叫 entity_extractor.py，未來可以啟用 LlamaIndex 的自動抽取：

index = create_property_graph_index(
    documents=docs,
    use_llm_extractor=True,  # 自動抽取實體
)

總結

遷移項目	效益
LangChain → LlamaIndex	更簡潔的 API、資料導向設計
統一 Provider	減少 ~100 行重複程式碼
PropertyGraphIndex	標準介面、易於擴展
petgraph 保留	高效能圖運算不受影響

關鍵心得：好的遷移不是推翻重來，而是在現有架構上疊加更好的抽象層。

Career Knowledge Base 是一個本地優先的履歷知識庫系統，使用 Python + LanceDB + petgraph (Rust) + LlamaIndex 建構。

← Previous
GraphRAG 進階：LLM 證據選擇與跨層分數融合
Next →
2026 機場選購指南：大陸用戶翻牆服務怎麼挑？