如何搭建 Vearch 向量数据库

2025/4/26 12:52:03 来源：https://blog.csdn.net/ttyy1112/article/details/147479981 浏览: 次关键词：如何搭建 Vearch 向量数据库

如何搭建 Vearch 向量数据库

Vearch 是一个开源的分布式向量搜索系统，由京东开发并开源，适用于大规模向量相似性搜索场景。以下是搭建 Vearch 的详细步骤：

一、环境准备

系统要求

Linux 系统 (推荐 Ubuntu 18.04+ 或 CentOS 7+)
Docker 和 Docker Compose (容器化部署方式)
Go 1.13+ (如需从源码编译)
Python 3.6+ (客户端使用)

硬件建议

CPU: 4核以上
内存: 8GB以上 (根据数据规模调整)
磁盘: SSD存储推荐

二、安装方式选择

1. 快速体验 (Docker 方式)

# 拉取最新镜像
docker pull vearch/vearch:latest# 启动单机版
docker run -d -p 8817:8817 -p 9001:9001 vearch/vearch:latest all

2. 生产环境部署 (集群模式)

下载发布包

wget https://github.com/vearch/vearch/releases/download/v3.3.7/vearch-3.3.7.tar.gz
tar -zxvf vearch-3.3.7.tar.gz
cd vearch-3.3.7

配置集群

编辑 config.toml 文件：

[global]
name = "vearch"
data = ["datas/"]
log = "logs/"
level = "info"
signkey = "vearch"[monitor]
port = 9008[router]
port = 9001
skip_auth = true[master]
name = "m1"
address = "127.0.0.1"
port = 8817

启动集群

# 启动master节点
./bin/vearch -conf config.toml master start# 启动router节点
./bin/vearch -conf config.toml router start# 启动ps节点(存储和计算节点)
./bin/vearch -conf config.toml ps start

三、基本操作

1. 创建集合(Collection)

import vearch# 连接集群
client = vearch.Client("http://localhost:8817")# 定义schema
schema = {"name": "image_search","partition_num": 1,"replica_num": 1,"engine": {"name": "gamma","index_size": 10000,"retrieval_type": "IVFPQ","retrieval_param": {"metric_type": "InnerProduct","ncentroids": 256,"nsubvector": 32}},"properties": {"image_vec": {"type": "vector","dimension": 128,"store_type": "MemoryOnly"},"image_id": {"type": "string","index": True}}
}# 创建集合
client.create_collection("test_db", schema)

2. 插入向量数据

collection = client.collection("test_db", "image_search")doc = {"image_id": "img001","image_vec": [0.12, 0.23, ..., 0.98]  # 128维向量
}collection.insert(doc)

3. 向量搜索

query_vec = [0.11, 0.22, ..., 0.99]  # 查询向量results = collection.search(vector=query_vec,vector_field="image_vec",topk=10,params={"nprobe": 20}
)for result in results:print(result["image_id"], result["_score"])

四、性能优化配置

1. 索引参数调整

"retrieval_param": {"metric_type": "L2",  # 距离度量方式"ncentroids": 512,    # 聚类中心数"nsubvector": 64      # 子向量数
}

2. 资源分配

[ps]
# 每个ps节点的线程数
rpc_port = 9002
heartbeat_port = 9003
raft_port = 9004
recover_port = 9005
data_port = 9006

3. 缓存配置

[engine]
max_size = 1000000  # 最大缓存文档数

五、集群扩展

增加PS节点

在新机器上部署相同程序
修改config.toml中的master地址
启动时指定角色：

./bin/vearch -conf config.toml ps start

数据分片

schema = {"partition_num": 4,  # 分片数"replica_num": 2     # 副本数
}

六、监控与维护

内置监控接口

http://<master_ip>:8817/_cluster/stats
http://<router_ip>:9001/_cluster/health

日志查看

tail -f logs/vearch.log

七、常见问题解决

启动失败：检查端口是否冲突，确保8817,9001等端口可用
搜索性能差：调整nprobe参数和索引类型
内存不足：减少max_size或增加机器资源

Vearch 提供了丰富的API和配置选项，可以根据实际应用场景灵活调整。对于生产环境，建议部署3个以上master节点保证高可用，并根据数据规模合理规划PS节点数量。

如何搭建 Vearch 向量数据库

如何搭建 Vearch 向量数据库

一、环境准备

系统要求

硬件建议

二、安装方式选择

1. 快速体验 (Docker 方式)

2. 生产环境部署 (集群模式)

下载发布包

配置集群

启动集群

三、基本操作

1. 创建集合(Collection)

2. 插入向量数据

3. 向量搜索

四、性能优化配置

1. 索引参数调整

2. 资源分配

3. 缓存配置

五、集群扩展

增加PS节点

数据分片

六、监控与维护

内置监控接口

日志查看

七、常见问题解决

相关资讯

热文排行

最新新闻

推荐新闻

热搜词