seekdb 与 LangChain 集成
seekdb 提供了向量类型存储、向量索引、embedding 向量搜索的能力。可以将向量化后的数据存储在 seekdb,供下一步的搜索使用。
LangChain 是一个用于开发由语言模型驱动的应用程序的框架。它使得应用程序能够:
- 具有上下文感知能力:将语言模型连接到上下文来源(提示指令,少量的示例,需要回应的内容等)。
- 具有推理能力:依赖语言模型进行推理(根据提供的上下文如何回答,采取什么行动等)。
本教程结合通义千问 API 演示如何将 seekdb 中的 向量搜索概述、通义千问 与 LangChain 集成实现文档问答。
前提条件
-
您已完成部署 seekdb 数据库。
-
您的环境中已存在可以使用的数据库和账号,并已对数据库账号授读写权限。
-
安装 Python 3.9 及以上版本。
-
安装依赖。
python3 -m pip install -U langchain-oceanbase
python3 -m pip install langchain_community
python3 -m pip install dashscope -
您可以设置
ob_vector_memory_limit_percentage配置项,以启用向量搜索功能。推荐保持默认值0(自适应模式)。如需更精确设置此配置项,请参考相关配置文档。
步骤一:获取数据库连接信息
联系 seekdb 数据库部署人员或者管理员获取相应的数据库连接串,例如:
obclient -h$host -P$port -u$user_name -p$password -D$database_name
参数说明:
-
$host:提供 seekdb 数据库连接 IP。 -
$port:提供 seekdb 数据库连接端口。默认是2881,可在部署时自定义。 -
$database_name:需要访问的数据库名称。注意
连接数据库的用户需要拥有该数据库的
CREATE、INSERT、DROP和SELECT权限。更多有关用户权限的信息,请参见 MySQL 模式下的权限分类。 -
$user_name:提供数据库连接账户,格式:用户名。 -
$password:提供账户密码。
更多连接串的信息,请参见 通过 OBClient 连接 OceanBase。
步骤二:构建您的 AI 助手
设置通义千问 API key 环境变量
获取 通义千问 API 密钥 并 配置API-KEY到环境变量。
export DASHSCOPE_API_KEY="YOUR_DASHSCOPE_API_KEY"
加载并分割文档
下载示例数据,将其拆分成每块约 1000 个字符的块 CharacterTextSplitter。
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import DashScopeEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_oceanbase.vectorstores import OceanbaseVectorStore
import os
import requests
DASHSCOPE_API = os.environ.get("DASHSCOPE_API_KEY", "")
embeddings = DashScopeEmbeddings(
model="text-embedding-v1", dashscope_api_key=DASHSCOPE_API
)
url = "https://raw.githubusercontent.com/GITHUBear/langchain/refs/heads/master/docs/docs/how_to/state_of_the_union.txt"
res = requests.get(url)
with open("state_of_the_union.txt", "w") as f:
f.write(res.text)
loader = TextLoader('./state_of_the_union.txt')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
将数据插入 seekdb 中
connection_args = {
"host": "127.0.0.1",
"port": "2881",
"user": "root@user_name",
"password": "",
"db_name": "test",
}
DEMO_TABLE_NAME = "demo_ann"
ob = OceanbaseVectorStore(
embedding_function=embeddings,
table_name=DEMO_TABLE_NAME,
connection_args=connection_args,
drop_old=True,
normalize=True,
)
res = ob.add_documents(documents=docs)
向量搜索
此步骤演示如何从文档 state_of_the_union.txt 中查询 “What did the president say about Ketanji Brown Jackson”。
query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = ob.similarity_search_with_score(query, k=3)
for doc, score in docs_with_score:
print("-" * 80)
print("Score: ", score)
print(doc.page_content)
print("-" * 80)
预期输出:
--------------------------------------------------------------------------------
Score: 1.204783671324283
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Score: 1.2146663629717394
It is going to transform America and put us on a path to win the economic competition of the 21st Century that we face with the rest of the world—particularly with China.
As I’ve told Xi Jinping, it is never a good bet to bet against the American people.
We’ll create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America.
And we’ll do it all to withstand the devastating effects of the climate crisis and promote environmental justice.
We’ll build a national network of 500,000 electric vehicle charging stations, begin to replace poisonous lead pipes—so every child—and every American—has clean water to drink at home and at school, provide affordable high-speed internet for every American—urban, suburban, rural, and tribal communities.
4,000 projects have already been announced.
And tonight, I’m announcing that this year we will start fixing over 65,000 miles of highway and 1,500 bridges in disrepair.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Score: 1.2193955178945004
Vice President Harris and I ran for office with a new economic vision for America.
Invest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up
and the middle out, not from the top down.
Because we know that when the middle class grows, the poor have a ladder up and the wealthy do very well.
America used to have the best roads, bridges, and airports on Earth.
Now our infrastructure is ranked 13th in the world.
We won’t be able to compete for the jobs of the 21st Century if we don’t fix that.
That’s why it was so important to pass the Bipartisan Infrastructure Law—the most sweeping investment to rebuild America in history.
This was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen.
We’re done talking about infrastructure weeks.
We’re going to have an infrastructure decade.
--------------------------------------------------------------------------------