探索腾讯云向量数据库:从入门开始

概述

向量数据库是一种专门用于存储和查询向量数据的数据库。向量数据的典型结构是一个一维数组,其中的元素是数值(通常是浮点数)。这些数值表示对象或数据点在多维空间中的位置、特征或属性。例如: 在自然语言处理中,一篇文章可以用一个词向量表示,每个词可以用一个数字表示其在词向量空间中的位置。 在图像处理中,一张图片可以用一个像素向量表示,每个像素可以用三个数字表示其RGB值。 在推荐系统中,一个用户可以用一个用户向量表示,每个维度可以表示不同的兴趣爱好或行为偏好。 腾讯云向量数据库(Tencent Cloud VectorDB)是一款全托管的自研企业级分布式数据库服务,单索引支持 10 亿级向量规模,可支持百万级 QPS 及毫秒级查询延迟。不仅能提高大模型回答的准确性,还可广泛应用于推荐系统、自然语言处理等领域。

什么是向量数据库

向量数据库是一种专门用于存储、检索和计算向量的数据库系统。它通过将数据表示为向量(数学上的一种数据结构),从而能够高效地处理相似性搜索和聚类等任务。这种数据库通常用于处理大规模的高维数据,如图像、文本和音频等。通过使用向量数据库,可以更有效地进行复杂的数据分析和模式识别。

举个例子

想象一下你有一组电影,每部电影都可以用一个向量表示,其中包含了影片的不同特征,比如类型、导演、演员等。现在,你想找到与给定电影最相似的电影。

如果使用向量数据库,你可以将每部电影表示为一个向量,比如3,1,4,2,...3,1,4,2,...,其中每个数字代表某个特征的值。现在,通过计算向量之间的相似性,你可以找到与给定电影向量最接近的其他电影向量。

这样,当你想要寻找与某部电影相似的电影时,不需要遍历整个电影库,而是可以利用向量数据库迅速找到最匹配的电影,使得相似性计算更加高效。

倒序索引

向量数据库和倒排索引有一些相似之处,尤其是在处理相似性搜索的情境下。

在倒排索引中,你会创建一个映射,将每个关键词(或特征)与包含该关键词的文档关联起来。这样,在搜索时,你可以快速找到包含特定关键词的文档,而不必遍历整个文档集合。

类似地,在向量数据库中,你将每个数据点(如图像或电影)表示为向量,并通过计算向量之间的相似性来实现搜索。这就使得在高维空间中寻找相似项变得更加高效,因为你可以快速排除那些在向量空间中距离较远的项。

所以,两者都是为了提高相似性搜索的效率,但实现的方式略有不同。倒排索引更注重关键词的匹配,而向量数据库则更专注于在高维空间中寻找相似的向量。

产品架构

腾讯云向量数据库(Tencent Cloud VectorDB)采用分布式部署架构,每个节点相互通信和协调,实现数据存储与检索。客户端请求通过 Load balance 分发到各节点上。

负载均衡(Load Balancer,LB):是对多台后端服务器进行流量分发的服务。向量数据库集群架构节点数量 >= 3,自动通过 Load balance 来均衡访问。

分布式 Storage Node:向量数据库集群由多个节点构成,每个节点均可直接进行读/写操作,负责数据的计算及存储。Collection 是向量数据的基本组织形式,将向量集合拆分成多个分片,并分配到不同的节点上进行存储和处理;每个分片还会在其他节点上同步产生多个副本,以保证数据库服务的可扩展性与高可用性。

Meta Server:集群管理模块,由一组 Master 节点组成,负责存储集群的节点信息、数据分片信息等元数据信息。

Embedding Service:是一种将非结构化数据(如文本、图像、音频等)转换为向量表示的服务,从而方便进行分析、聚类等操作。具体信息,请参见 Embedding 介绍。

Split Service:是一种将文本拆分成短语或句子等的服务。

说明:腾讯云向量数据库提供的 Split Service 模型能力,目前在开发调试中。具体上线时间,请关注 产品动态。

Object Service:负责将数据批量导入到指定集合,支持多种数据导入格式。

Object Storage:用于存储和管理数据导入服务中上传的数据文件。

不同数据类型的对比

我们简单地描述它们的区别,并配以一些具体的例子:

特点

关系型数据库

非关系型数据库

向量数据库

数据组织方式

表格形式,例如学生表、课程表

键值对(Key-Value),例如存储用户配置信息的键值对

向量表示,例如图像向量表示

数据结构

表、行、列

可能是键值对、文档、列族等

高维向量,每个维度表示特征

查询语言

SQL

通常没有统一的查询语言

专注于相似性搜索的查询语言

一致性和事务

强调一致性和事务处理

强调分布式和横向扩展

强调相似性计算和搜索

应用场景

复杂查询、事务处理

大规模、分布式、动态数据

相似性搜索、推荐系统、图像识别等

例子:

关系型数据库:

代码语言:txt
复制
- _表格形式的数据表示:_ 学生表包含学生的学号、姓名、课程表包含课程信息。
- _SQL查询:_ `SELECT * FROM Students WHERE Grade > 90;`

非关系型数据库:

代码语言:txt
复制
- _键值对形式的数据表示:_ 存储用户配置信息,如`{"username": "user1", "email": "user1@example.com"}`。
- _动态数据:_ 社交媒体中用户的实时更新。

向量数据库:

代码语言:txt
复制
- _向量表示的数据:_ 图像可以被表示为高维向量,其中每个维度表示图像的某个特征。
- _相似性搜索:_ 寻找与给定图像向量相似的其他图像向量。

快速入门

购买数据库实例

操作场景

您可根据本文的介绍,购买和配置您的第一台腾讯云向量数据库(Tencent Cloud VectorDB)。

地域

当前支持北京、上海、广州、上海自动驾驶云、中国香港、新加坡,其他地域在规划准备中。

前提条件

已注册腾讯云账号并完成实名认证。

如需注册腾讯云账号:请单击 注册腾讯云账号。

如需完成实名认证:请单击 实名认证。

已规划数据库实例需满足的规格。具体信息,请参见 产品规格。

已规划数据库实例的私有网络与安全组,请参见 私有网络 与 安全组。

操作步骤

  1. 使用腾讯云账号登录 向量数据库控制台。
  2. 单击新建,进入新建向量数据库实例页面。
  3. 请参见下表,配置如下参数,购买实例。

内网登录

新建数据库

开启外网登录

测试连接

HTTP API

腾讯云向量数据库(Tencent Cloud VectorDB)通过 HTTP 协议进行数据写入和查询等操作。您可以将不同类型的请求消息以 JSON 格式放入 HTTP 请求消息 Body 中,将请求发送到 VectorDB 的 HTTP API 地址即可。VectorDB 将自动解析请求消息 Body 中的 JSON 数据,并将其存储到数据库中。

API 列表

接口层级

接口名

接口含义

请求方式

URL 拼接地址

Database

/database/create

创建数据库

POST

http://{实例内网IP地址}:{实例网络端口}/database/create

Database

/database/drop

删除数据库

POST

http://{实例内网IP地址}:{实例网络端口}/database/drop

Database

/database/list

查询所有数据库

GET

http://{实例内网IP地址}:{实例网络端口}/database/list

Collection

/collection/create

创建集合

POST

http://{实例内网IP地址}:{实例网络端口}/collection/create

Collection

/collection/drop

删除集合

POST

http://{实例内网IP地址}:{实例网络端口}/collection/drop

Collection

/collection/list

查询集合

POST

http://{实例内网IP地址}:{实例网络端口}/collection/list

Collection

/collection/describe

查询指定集合

POST

http://{实例内网IP地址}:{实例网络端口}/collection/describe

Collection

/collection/truncate

清空集合别名

POST

http://{实例内网IP地址}:{实例网络端口}/collection/truncate

Alias

/alias/set

给集合创建别名

POST

http://{实例内网IP地址}:{实例网络端口}/alias/set

Alias

/alias/delete

删除集合别名

POST

http://{实例内网IP地址}:{实例网络端口}/alias/delete

Document

/document/upsert

插入数据

POST

http://{实例内网IP地址}:{实例网络端口}/document/upsert

Document

/document/query

精确查找数据

POST

http://{实例内网IP地址}:{实例网络端口}/document/query

Document

/document/search

检索相似向量

POST

http://{实例内网IP地址}:{实例网络端口}/document/search

Document

/document/delete

删除数据

POST

http://{实例内网IP地址}:{实例网络端口}/document/delete

Document

/document/update

更新数据

POST

http://{实例内网IP地址}:{实例网络端口}/document/update

Index

/index/rebuild

重建索引

POST

http://{实例内网IP地址}:{实例网络端口}/index/rebuild

Java SDK

腾讯云向量数据库(Tencent Cloud VectorDB)的 Java SDK 是将 HTTP API 封装成易于使用的 Java 函数或类。开发者可以通过 Java SDK 更加方便地操作数据库。

SDK 信息

语言

版本

SDK 下载

SDK 源码

API

Java

Java 8 或更高版本

vectordb-sdk-java.tar.gz 说明:SDK 最新版本为:1.0.3。

vectordb-sdk-java-source.tar.gz

创建客户端: VectorDBClient() 管理数据库:createDatabase() 管理 Collection:createCollection() 管理 Document:upsert()

接入方式

如下为 Gradle 与 Maven 项目添加 SDK 最新版本 1.0.3 依赖的不同方式,请依据实际需求添加。

Gradle 引入

请在 Gradle 项目的 build.gradle 文件中添加如下依赖。

代码语言:javascript
复制
com.tencent.tcvectordb:vectordatabase-sdk-java:1.0.3

Maven 引入

请在 Maven 项目的 pom.xml 文件中添加如下依赖。

代码语言:javascript
复制
 <dependency>
      <groupId>com.tencent.tcvectordb</groupId>
      <artifactId>vectordatabase-sdk-java</artifactId>
      <version>1.0.3</version>
 </dependency>

<dependency>
<groupId>org.web3j</groupId>
<artifactId>core</artifactId>
<version>5.0.0</version>
</dependency>

<dependency>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
<version>4.9.2</version>
</dependency>

okhttp3版本是4.9.2因为vectordatabase-sdk-java1.0.3的引用

代码示例

代码语言:java
复制
package com.example.demo.controller;
import com.tencent.tcvectordb.client.VectorDBClient;
import com.tencent.tcvectordb.exception.VectorDBException;
import com.tencent.tcvectordb.model.Collection;
import com.tencent.tcvectordb.model.Database;
import com.tencent.tcvectordb.model.DocField;
import com.tencent.tcvectordb.model.Document;
import com.tencent.tcvectordb.model.param.collection.;
import com.tencent.tcvectordb.model.param.database.ConnectParam;
import com.tencent.tcvectordb.model.param.dml.
;
import com.tencent.tcvectordb.model.param.entity.AffectRes;
import com.tencent.tcvectordb.model.param.enums.ReadConsistencyEnum;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
/**

  • VectorDB Java SDK usage example
    */
    public class VectorDBExample {

    private static final String DBNAME = "book";
    private static final String COLL_NAME = "book_segments";
    private static final String COLL_NAME_ALIAS = "collection_alias";

    public static void main(String[] args) throws InterruptedException {
    example();
    }

    public static void example() throws InterruptedException {
    // 创建 VectorDB Client
    ConnectParam connectParam = initConnectParam();
    VectorDBClient client = new VectorDBClient(connectParam, ReadConsistencyEnum.EVENTUAL_CONSISTENCY);

     // 测试前清理环境
     System.out.println(&#34;---------------------- clear before test ----------------------&#34;);
     anySafe(() -&gt; clear(client));
     createDatabaseAndCollection(client);
     upsertData(client);
     queryData(client);
     updateAndDelete(client);
     deleteAndDrop(client);
     testFilter();
    

    }

    /**

    • init connect parameter
    • @return {@link ConnectParam}
      */
      private static ConnectParam initConnectParam() {
      // System.out.println("\tvdb_url: " + System.getProperty("vdb_url"));
      // System.out.println("\tvdb_key: " + System.getProperty("vdb_key"));
      return ConnectParam.newBuilder()
      .withUrl("http://lb-xxxxxxxxxx.com")
      .withUsername("你自己的用户名")
      .withKey("你自己的密码")
      .withTimeout(30)
      .build();
      }

    /**

    • 执行 {@link Runnable} 捕获所有异常
    • @param runnable {@link Runnable}
      */
      private static void anySafe(Runnable runnable) {
      try {
      runnable.run();
      } catch (VectorDBException e) {
      System.err.println(e);
      e.printStackTrace();
      }
      }

    private static void createDatabaseAndCollection(VectorDBClient client) {
    // 1. 创建数据库
    System.out.println("---------------------- createDatabase ----------------------");
    Database db = client.createDatabase(DBNAME);

     // 2. 列出所有数据库
     System.out.println(&#34;---------------------- listCollections ----------------------&#34;);
     List&lt;String&gt; database = client.listDatabase();
     for (String s : database) {
         System.out.println(&#34;\tres: &#34; + s);
     }
    
     // 3. 创建 collection
     System.out.println(&#34;---------------------- createCollection ----------------------&#34;);
     CreateCollectionParam collectionParam = initCreateCollectionParam(COLL_NAME);
     db.createCollection(collectionParam);
    
     // 4. 列出所有 collection
     System.out.println(&#34;---------------------- listCollections ----------------------&#34;);
     List&lt;Collection&gt; cols = db.listCollections();
     for (Collection col : cols) {
         System.out.println(&#34;\tres: &#34; + col.toString());
     }
    
     // 5. 设置 collection 别名
     System.out.println(&#34;---------------------- setAlias ----------------------&#34;);
     AffectRes affectRes = db.setAlias(COLL_NAME, COLL_NAME_ALIAS);
     System.out.println(&#34;\tres: &#34; + affectRes.toString());
    
     // 6. describe collection
     System.out.println(&#34;---------------------- describeCollection ----------------------&#34;);
     Collection descCollRes = db.describeCollection(COLL_NAME);
     System.out.println(&#34;\tres: &#34; + descCollRes.toString());
    
     // 7. delete alias
     System.out.println(&#34;---------------------- deleteAlias ----------------------&#34;);
     AffectRes affectRes1 = db.deleteAlias(COLL_NAME_ALIAS);
     System.out.println(&#34;\tres: &#34; + affectRes1);
    
     // 8. describe collection
     System.out.println(&#34;---------------------- describeCollection ----------------------&#34;);
     Collection descCollRes1 = db.describeCollection(COLL_NAME);
     System.out.println(&#34;\tres: &#34; + descCollRes1.toString());
    

    }

    private static void upsertData(VectorDBClient client) throws InterruptedException {
    Database database = client.database(DBNAME);
    Collection collection = database.describeCollection(COLL_NAME);
    List<Document> documentList = new ArrayList<>(Arrays.asList(
    Document.newBuilder()
    .withId("0001")
    .withVector(Arrays.asList(0.2123, 0.21, 0.213))
    .addDocField(new DocField("bookName", "西游记"))
    .addDocField(new DocField("author", "吴承恩"))
    .addDocField(new DocField("page", 21))
    .addDocField(new DocField("segment", "富贵功名,前缘分定,为人切莫欺心。"))
    .build(),
    Document.newBuilder()
    .withId("0002")
    .withVector(Arrays.asList(0.2123, 0.22, 0.213))
    .addDocField(new DocField("bookName", "西游记"))
    .addDocField(new DocField("author", "吴承恩"))
    .addDocField(new DocField("page", 22))
    .addDocField(new DocField("segment",
    "正大光明,忠良善果弥深。些些狂妄天加谴,眼前不遇待时临。"))
    .build(),
    Document.newBuilder()
    .withId("0003")
    .withVector(Arrays.asList(0.2123, 0.23, 0.213))
    .addDocField(new DocField("bookName", "三国演义"))
    .addDocField(new DocField("author", "罗贯中"))
    .addDocField(new DocField("page", 23))
    .addDocField(new DocField("segment", "细作探知这个消息,飞报吕布。"))
    .build(),
    Document.newBuilder()
    .withId("0004")
    .withVector(Arrays.asList(0.2123, 0.24, 0.213))
    .addDocField(new DocField("bookName", "三国演义"))
    .addDocField(new DocField("author", "罗贯中"))
    .addDocField(new DocField("page", 24))
    .addDocField(new DocField("segment", "富贵功名,前缘分定,为人切莫欺心。"))
    .build(),
    Document.newBuilder()
    .withId("0005")
    .withVector(Arrays.asList(0.2123, 0.25, 0.213))
    .addDocField(new DocField("bookName", "三国演义"))
    .addDocField(new DocField("author", "罗贯中"))
    .addDocField(new DocField("page", 25))
    .addDocField(new DocField("segment",
    "布大惊,与陈宫商议。宫曰:“闻刘玄德新领徐州,可往投之。"))
    .build()));
    System.out.println("---------------------- upsert ----------------------");
    InsertParam insertParam = InsertParam.newBuilder().addAllDocument(documentList).withBuildIndex(true).build();
    collection.upsert(insertParam);

     // notice:upsert 操作可用会有延迟
     Thread.sleep(1000 * 5);
    

    }

    private static void queryData(VectorDBClient client) {
    Database database = client.database(DBNAME);
    Collection collection = database.describeCollection(COLL_NAME);

     System.out.println(&#34;---------------------- query ----------------------&#34;);
     List&lt;String&gt; documentIds = Arrays.asList(&#34;0001&#34;, &#34;0002&#34;, &#34;0003&#34;, &#34;0004&#34;, &#34;0005&#34;);
     Filter filterParam = new Filter(&#34;bookName=\&#34;三国演义\&#34;&#34;);
     List&lt;String&gt; outputFields = Arrays.asList(&#34;id&#34;, &#34;bookName&#34;);
     QueryParam queryParam = QueryParam.newBuilder()
             .withDocumentIds(documentIds)
             // 使用 filter 过滤数据
             .withFilter(filterParam)
             // limit 限制返回行数,1 到 16384 之间
             .withLimit(2)
             // 偏移
             .withOffset(1)
             // 指定返回的 fields
             .withOutputFields(outputFields)
             // 是否返回 vector 数据
             .withRetrieveVector(false)
             .build();
     List&lt;Document&gt; qdos = collection.query(queryParam);
     for (Document doc : qdos) {
         System.out.println(&#34;\tres: &#34; + doc.toString());
     }
    
     // searchById
     // 1. searchById 提供按 id 搜索的能力
     // 2. 支持通过 filter 过滤数据
     // 3. 如果仅需要部分 field 的数据,可以指定 output_fields 用于指定返回数据包含哪些 field,不指定默认全部返回
     // 4. limit 用于限制每个单元搜索条件的条数,如 vector 传入三组向量,limit 为 3,则 limit 限制的是每组向量返回 top 3 的相似度向量
    
     System.out.println(&#34;---------------------- searchById ----------------------&#34;);
     SearchByIdParam searchByIdParam = SearchByIdParam.newBuilder()
             .withDocumentIds(documentIds)
             // 若使用 HNSW 索引,则需要指定参数 ef,ef 越大,召回率越高,但也会影响检索速度
             .withParams(new HNSWSearchParams(100))
             // 指定 Top K 的 K 值
             .withLimit(2)
             // 过滤获取到结果
             .withFilter(filterParam)
             .build();
     List&lt;List&lt;Document&gt;&gt; siDocs = collection.searchById(searchByIdParam);
     int i = 0;
     for (List&lt;Document&gt; docs : siDocs) {
         System.out.println(&#34;\tres: &#34; + i++);
         for (Document doc : docs) {
             System.out.println(&#34;\tres: &#34; + doc.toString());
         }
     }
    
     // search
     // 1. search 提供按照 vector 搜索的能力
     // 其他选项类似 search 接口
    
     System.out.println(&#34;---------------------- search ----------------------&#34;);
     SearchByVectorParam searchByVectorParam = SearchByVectorParam.newBuilder()
             .addVector(Arrays.asList(0.2123, 0.23, 0.213))
             // 若使用 HNSW 索引,则需要指定参数ef,ef越大,召回率越高,但也会影响检索速度
             .withParams(new HNSWSearchParams(100))
             // 指定 Top K 的 K 值
             .withLimit(2)
             // 过滤获取到结果
             .withFilter(filterParam)
             .build();
     // 输出相似性检索结果,检索结果为二维数组,每一位为一组返回结果,分别对应 search 时指定的多个向量
     List&lt;List&lt;Document&gt;&gt; svDocs = collection.search(searchByVectorParam);
     i = 0;
     for (List&lt;Document&gt; docs : svDocs) {
         System.out.println(&#34;\tres: &#34; + i);
         i++;
         for (Document doc : docs) {
             System.out.println(&#34;\tres: &#34; + doc.toString());
         }
     }
    

    }

    private static void updateAndDelete(VectorDBClient client) throws InterruptedException {
    Database database = client.database(DBNAME);
    Collection collection = database.describeCollection(COLL_NAME);

     System.out.println(&#34;---------------------- update ----------------------&#34;);
     // update
     // 1. update 提供基于 [主键查询] 和 [Filter 过滤] 的部分字段更新或者非索引字段新增
    
     // filter 限制仅会更新 id = &#34;0003&#34;
     Filter filterParam = new Filter(&#34;bookName=\&#34;三国演义\&#34;&#34;);
     List&lt;String&gt; documentIds = Arrays.asList(&#34;0001&#34;, &#34;0003&#34;);
     UpdateParam updateParam = UpdateParam
             .newBuilder()
             .addAllDocumentId(documentIds)
             .withFilter(filterParam)
             .build();
     Document updateDoc = Document
             .newBuilder()
             .addDocField(new DocField(&#34;page&#34;, 100))
             // 支持添加新的内容
             .addDocField(new DocField(&#34;extend&#34;, &#34;extendContent&#34;))
             .build();
     collection.update(updateParam, updateDoc);
    
    
     System.out.println(&#34;---------------------- delete ----------------------&#34;);
     // delete
     // 1. delete 提供基于[ 主键查询]和[Filter 过滤]的数据删除能力
     // 2. 删除功能会受限于 collection 的索引类型,部分索引类型不支持删除操作
    
     // filter 限制只会删除 id = &#34;00001&#34; 成功
     filterParam = new Filter(&#34;bookName=\&#34;西游记\&#34;&#34;);
     DeleteParam build = DeleteParam
             .newBuilder()
             .addAllDocumentId(documentIds)
             .withFilter(filterParam)
             .build();
     collection.delete(build);
    
     // rebuild index
     System.out.println(&#34;---------------------- rebuildIndex ----------------------&#34;);
    
     RebuildIndexParam rebuildIndexParam = RebuildIndexParam
             .newBuilder()
             .withDropBeforeRebuild(false)
             .withThrottle(1)
             .build();
     collection.rebuildIndex(rebuildIndexParam);
    
     Thread.sleep(1000 * 5);
    
     // query
     System.out.println(&#34;----------------------  query ----------------------&#34;);
     documentIds = Arrays.asList(&#34;0001&#34;, &#34;0002&#34;, &#34;0003&#34;, &#34;0004&#34;, &#34;0005&#34;);
     List&lt;String&gt; outputFields = Arrays.asList(&#34;id&#34;, &#34;bookName&#34;, &#34;page&#34;, &#34;extend&#34;);
     QueryParam queryParam = QueryParam.newBuilder()
             .withDocumentIds(documentIds)
             // 使用 filter 过滤数据
             .withOutputFields(outputFields)
             // 是否返回 vector 数据
             .withRetrieveVector(false)
             .build();
     List&lt;Document&gt; qdos = collection.query(queryParam);
     for (Document doc : qdos) {
         System.out.println(&#34;\tres: &#34; + doc.toString());
     }
    
     // truncate 会清除整个 Collection 的数据,包括索引
     System.out.println(&#34;---------------------- truncate collection ----------------------&#34;);
     AffectRes affectRes = database.truncateCollections(COLL_NAME);
     System.out.println(&#34;\tres: &#34; + affectRes.toString());
    
     // notice:delete操作可用会有延迟
     Thread.sleep(1000 * 5);
    

    }

    private static void deleteAndDrop(VectorDBClient client) {
    Database database = client.database(DBNAME);

     // 删除 collection
     System.out.println(&#34;---------------------- truncate collection ----------------------&#34;);
     database.dropCollection(COLL_NAME);
    
     // 删除 database
     System.out.println(&#34;---------------------- truncate collection ----------------------&#34;);
     client.dropDatabase(DBNAME);
    

    }

    private static void clear(VectorDBClient client) {
    List<String> databases = client.listDatabase();
    for (String database : databases) {
    client.dropDatabase(database);
    }
    }

    /**

    • 初始化创建 Collection 参数
    • 通过调用 addField 方法设计索引(不是设计 Collection 的结构)
    • <ol>
    • &lt;li&gt;【重要的事】向量对应的文本字段不要建立索引,会浪费较大的内存,并且没有任何作用。&lt;/li&gt;
      
    • &lt;li&gt;【必须的索引】:主键id、向量字段 vector 这两个字段目前是固定且必须的,参考下面的例子;&lt;/li&gt;
      
    • &lt;li&gt;【其他索引】:检索时需作为条件查询的字段,比如要按书籍的作者进行过滤,这个时候author字段就需要建立索引,
      
    • 否则无法在查询的时候对 author 字段进行过滤,不需要过滤的字段无需加索引,会浪费内存;&lt;/li&gt;
      
    • &lt;li&gt;向量数据库支持动态 Schema,写入数据时可以写入任何字段,无需提前定义,类似MongoDB.&lt;/li&gt;
      
    • &lt;li&gt;&lt;例子中创建一个书籍片段的索引,例如书籍片段的信息包括 {id, vector, segment, bookName, author, page},
      
    • id 为主键需要全局唯一,segment 为文本片段, vector 字段需要建立向量索引,假如我们在查询的时候要查询指定书籍
      
    • @param collName
    • @return
      */
      private static CreateCollectionParam initCreateCollectionParam(String collName) {
      return CreateCollectionParam.newBuilder()
      .withName(collName)
      .withShardNum(2)
      .withReplicaNum(0)
      .withDescription("test collection0")
      .addField(new FilterIndex("id", FieldType.String, IndexType.PRIMARY_KEY))
      .addField(new VectorIndex("vector", 3, IndexType.HNSW,
      MetricType.COSINE, new HNSWParams(16, 200)))
      .addField(new FilterIndex("bookName", FieldType.String, IndexType.FILTER))
      .addField(new FilterIndex("author", FieldType.String, IndexType.FILTER))
      .build();
      }

    /**

    • 测试 Filter
      */
      public static void testFilter() {
      System.out.println("---------------------- testFilter ----------------------");
      System.out.println("\tres: " + new Filter("author=&#34;jerry&#34;")
      .and("a=1")
      .or("r=&#34;or&#34;")
      .orNot("rn=2")
      .andNot("an=&#34;andNot&#34;")
      .getCond());
      System.out.println("\tres: " + Filter.in("key", Arrays.asList("v1", "v2", "v3")));
      System.out.println("\tres: " + Filter.in("key", Arrays.asList(1, 2, 3)));
      }
      }

执行完成

代码语言:javascript
复制
执行日志!

15:30:56.058 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - header: Bearer account=root&api_key=xxx
---------------------- clear before test ----------------------
15:30:56.539 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/database/list, code=200, msg=OK, result={"code":0,"msg":"operation success","databases":["book"],"info":{"book":{"createTime":"2023-11-23 15:29:46","dbType":"BASE"}}}
15:30:56.594 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/database/drop, body={"database":"book"}
15:30:56.656 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/database/drop, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
---------------------- createDatabase ----------------------
15:30:56.657 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/database/create, body={"database":"book"}
15:30:56.717 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/database/create, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
---------------------- listCollections ----------------------
15:30:56.776 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/database/list, code=200, msg=OK, result={"code":0,"msg":"operation success","databases":["book"],"info":{"book":{"createTime":"2023-11-23 15:30:56","dbType":"BASE"}}}
res: book
---------------------- createCollection ----------------------
15:30:56.890 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/create, body={"database":"book","collection":"book_segments","replicaNum":0,"shardNum":2,"description":"test collection0","indexes":[{"fieldName":"id","fieldType":"string","indexType":"primaryKey"},{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","metricType":"COSINE","params":{"efConstruction":200,"M":16},"dimension":3},{"fieldName":"bookName","fieldType":"string","indexType":"filter"},{"fieldName":"author","fieldType":"string","indexType":"filter"}],"documentCount":0}
15:31:00.478 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/create, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
---------------------- listCollections ----------------------
15:31:00.478 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/list, body={"database":"book"}
15:31:00.537 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/list, code=200, msg=OK, result={"code":0,"msg":"operation success","collections":[{"database":"book","collection":"book_segments","documentCount":0,"replicaNum":0,"shardNum":2,"createTime":"2023-11-23 15:30:56","description":"test collection0","indexes":[{"fieldName":"id","fieldType":"string","indexType":"primaryKey"},{"fieldName":"author","fieldType":"string","indexType":"filter"},{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","indexedCount":0,"dimension":3,"metricType":"COSINE","params":{"M":16,"efConstruction":200}},{"fieldName":"bookName","fieldType":"string","indexType":"filter"}],"indexStatus":{"status":"ready","startTime":""}}]}
res: {"database":"book","collection":"book_segments","replicaNum":0,"shardNum":2,"description":"test collection0","indexes":[{"fieldName":"id","fieldType":"string","indexType":"primaryKey"},{"fieldName":"author","fieldType":"string","indexType":"filter"},{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","metricType":"COSINE","params":{"efConstruction":200,"M":16},"dimension":3,"indexedCount":0},{"fieldName":"bookName","fieldType":"string","indexType":"filter"}],"createTime":"2023-11-23 15:30:56","documentCount":0,"indexStatus":{"status":"ready"}}
---------------------- setAlias ----------------------
15:31:00.625 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/alias/set, body={"database":"book","collection":"book_segments","alias":"collection_alias"}
15:31:00.684 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/alias/set, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
res: AffectRes{affectedCount=1, code=0, msg='operation success'}
---------------------- describeCollection ----------------------
15:31:00.687 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/describe, body={"database":"book","collection":"book_segments"}
15:31:00.746 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/describe, code=200, msg=OK, result={"code":0,"msg":"operation success","collection":{"database":"book","collection":"book_segments","documentCount":0,"alias":["collection_alias"],"replicaNum":0,"shardNum":2,"createTime":"2023-11-23 15:30:56","description":"test collection0","indexes":[{"fieldName":"author","fieldType":"string","indexType":"filter"},{"fieldName":"id","fieldType":"string","indexType":"primaryKey"},{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","indexedCount":0,"dimension":3,"metricType":"COSINE","params":{"M":16,"efConstruction":200}},{"fieldName":"bookName","fieldType":"string","indexType":"filter"}],"indexStatus":{"status":"ready","startTime":""}}}
res: {"database":"book","collection":"book_segments","replicaNum":0,"shardNum":2,"description":"test collection0","indexes":[{"fieldName":"author","fieldType":"string","indexType":"filter"},{"fieldName":"id","fieldType":"string","indexType":"primaryKey"},{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","metricType":"COSINE","params":{"efConstruction":200,"M":16},"dimension":3,"indexedCount":0},{"fieldName":"bookName","fieldType":"string","indexType":"filter"}],"createTime":"2023-11-23 15:30:56","documentCount":0,"indexStatus":{"status":"ready"},"alias":["collection_alias"]}
---------------------- deleteAlias ----------------------
15:31:00.759 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/alias/delete, body={"database":"book","alias":"collection_alias"}
15:31:00.820 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/alias/delete, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
res: AffectRes{affectedCount=1, code=0, msg='operation success'}
---------------------- describeCollection ----------------------
15:31:00.822 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/describe, body={"database":"book","collection":"book_segments"}
15:31:00.881 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/describe, code=200, msg=OK, result={"code":0,"msg":"operation success","collection":{"database":"book","collection":"book_segments","documentCount":0,"replicaNum":0,"shardNum":2,"createTime":"2023-11-23 15:30:56","description":"test collection0","indexes":[{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","indexedCount":0,"dimension":3,"metricType":"COSINE","params":{"M":16,"efConstruction":200}},{"fieldName":"bookName","fieldType":"string","indexType":"filter"},{"fieldName":"author","fieldType":"string","indexType":"filter"},{"fieldName":"id","fieldType":"string","indexType":"primaryKey"}],"indexStatus":{"status":"ready","startTime":""}}}
res: {"database":"book","collection":"book_segments","replicaNum":0,"shardNum":2,"description":"test collection0","indexes":[{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","metricType":"COSINE","params":{"efConstruction":200,"M":16},"dimension":3,"indexedCount":0},{"fieldName":"bookName","fieldType":"string","indexType":"filter"},{"fieldName":"author","fieldType":"string","indexType":"filter"},{"fieldName":"id","fieldType":"string","indexType":"primaryKey"}],"createTime":"2023-11-23 15:30:56","documentCount":0,"indexStatus":{"status":"ready"}}
15:31:00.894 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/describe, body={"database":"book","collection":"book_segments"}
15:31:00.956 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/describe, code=200, msg=OK, result={"code":0,"msg":"operation success","collection":{"database":"book","collection":"book_segments","documentCount":0,"replicaNum":0,"shardNum":2,"createTime":"2023-11-23 15:30:56","description":"test collection0","indexes":[{"fieldName":"id","fieldType":"string","indexType":"primaryKey"},{"fieldName":"author","fieldType":"string","indexType":"filter"},{"fieldName":"bookName","fieldType":"string","indexType":"filter"},{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","indexedCount":0,"dimension":3,"metricType":"COSINE","params":{"M":16,"efConstruction":200}}],"indexStatus":{"status":"ready","startTime":""}}}
---------------------- upsert ----------------------
15:31:00.977 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/document/upsert, body={"database":"book","collection":"book_segments","buildIndex":true,"documents":[{"id":"0001","vector":[0.2123,0.21,0.213],"bookName":"西游记","author":"吴承恩","page":21,"segment":"富贵功名,前缘分定,为人切莫欺心。"},{"id":"0002","vector":[0.2123,0.22,0.213],"bookName":"西游记","author":"吴承恩","page":22,"segment":"正大光明,忠良善果弥深。些些狂妄天加谴,眼前不遇待时临。"},{"id":"0003","vector":[0.2123,0.23,0.213],"bookName":"三国演义","author":"罗贯中","page":23,"segment":"细作探知这个消息,飞报吕布。"},{"id":"0004","vector":[0.2123,0.24,0.213],"bookName":"三国演义","author":"罗贯中","page":24,"segment":"富贵功名,前缘分定,为人切莫欺心。"},{"id":"0005","vector":[0.2123,0.25,0.213],"bookName":"三国演义","author":"罗贯中","page":25,"segment":"布大惊,与陈宫商议。宫曰:“闻刘玄德新领徐州,可往投之。"}]}
15:31:01.049 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/document/upsert, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":5}
15:31:06.051 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/describe, body={"database":"book","collection":"book_segments"}
15:31:06.111 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/describe, code=200, msg=OK, result={"code":0,"msg":"operation success","collection":{"database":"book","collection":"book_segments","documentCount":5,"replicaNum":0,"shardNum":2,"createTime":"2023-11-23 15:30:56","description":"test collection0","indexes":[{"fieldName":"bookName","fieldType":"string","indexType":"filter"},{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","indexedCount":5,"dimension":3,"metricType":"COSINE","params":{"M":16,"efConstruction":200}},{"fieldName":"id","fieldType":"string","indexType":"primaryKey"},{"fieldName":"author","fieldType":"string","indexType":"filter"}],"indexStatus":{"status":"ready","startTime":""}}}
---------------------- query ----------------------
15:31:06.130 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/document/query, body={"database":"book","collection":"book_segments","readConsistency":"eventualConsistency","query":{"filter":"bookName=&#34;三国演义&#34;","documentIds":["0001","0002","0003","0004","0005"],"retrieveVector":false,"limit":2,"offset":1,"outputFields":["id","bookName"]}}
15:31:06.189 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/document/query, code=200, msg=OK, result={"code":0,"msg":"operation success","count":2,"documents":[{"id":"0003","bookName":"三国演义"},{"id":"0004","bookName":"三国演义"}]}
res: {"id":"0003","bookName":"三国演义"}
res: {"id":"0004","bookName":"三国演义"}
---------------------- searchById ----------------------
15:31:06.202 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/document/search, body={"database":"book","collection":"book_segments","search":{"params":{"ef":100},"filter":"bookName=&#34;三国演义&#34;","retrieveVector":false,"limit":2,"documentIds":["0001","0002","0003","0004","0005"]},"readConsistency":"eventualConsistency"}
15:31:06.264 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/document/search, code=200, msg=OK, result={"code":0,"msg":"operation success","documents":[[{"id":"0003","score":0.999062,"bookName":"三国演义","author":"罗贯中","segment":"细作探知这个消息,飞报吕布。","page":23},{"id":"0004","score":0.997955,"bookName":"三国演义","page":24,"author":"罗贯中","segment":"富贵功名,前缘分定,为人切莫欺心。"}],[{"id":"0003","score":0.999773,"author":"罗贯中","segment":"细作探知这个消息,飞报吕布。","page":23,"bookName":"三国演义"},{"id":"0004","score":0.99912,"author":"罗贯中","bookName":"三国演义","segment":"富贵功名,前缘分定,为人切莫欺心。","page":24}],[{"id":"0003","score":1.0,"bookName":"三国演义","segment":"细作探知这个消息,飞报吕布。","author":"罗贯中","page":23},{"id":"0004","score":0.999787,"page":24,"bookName":"三国演义","segment":"富贵功名,前缘分定,为人切莫欺心。","author":"罗贯中"}],[{"id":"0004","score":1.0,"segment":"富贵功名,前缘分定,为人切莫欺心。","author":"罗贯中","page":24,"bookName":"三国演义"},{"id":"0005","score":0.9998,"bookName":"三国演义","page":25,"segment":"布大惊,与陈宫商议。宫曰:“闻刘玄德新领徐州,可往投之。","author":"罗贯中"}],[{"id":"0005","score":1.0,"author":"罗贯中","bookName":"三国演义","segment":"布大惊,与陈宫商议。宫曰:“闻刘玄德新领徐州,可往投之。","page":25},{"id":"0004","score":0.9998,"segment":"富贵功名,前缘分定,为人切莫欺心。","bookName":"三国演义","page":24,"author":"罗贯中"}]]}
res: 0
res: {"id":"0003","score":0.999062,"bookName":"三国演义","author":"罗贯中","segment":"细作探知这个消息,飞报吕布。","page":23}
res: {"id":"0004","score":0.997955,"bookName":"三国演义","page":24,"author":"罗贯中","segment":"富贵功名,前缘分定,为人切莫欺心。"}
res: 1
res: {"id":"0003","score":0.999773,"author":"罗贯中","segment":"细作探知这个消息,飞报吕布。","page":23,"bookName":"三国演义"}
res: {"id":"0004","score":0.99912,"author":"罗贯中","bookName":"三国演义","segment":"富贵功名,前缘分定,为人切莫欺心。","page":24}
res: 2
res: {"id":"0003","score":1.0,"bookName":"三国演义","segment":"细作探知这个消息,飞报吕布。","author":"罗贯中","page":23}
res: {"id":"0004","score":0.999787,"page":24,"bookName":"三国演义","segment":"富贵功名,前缘分定,为人切莫欺心。","author":"罗贯中"}
res: 3
res: {"id":"0004","score":1.0,"segment":"富贵功名,前缘分定,为人切莫欺心。","author":"罗贯中","page":24,"bookName":"三国演义"}
res: {"id":"0005","score":0.9998,"bookName":"三国演义","page":25,"segment":"布大惊,与陈宫商议。宫曰:“闻刘玄德新领徐州,可往投之。","author":"罗贯中"}
res: 4
res: {"id":"0005","score":1.0,"author":"罗贯中","bookName":"三国演义","segment":"布大惊,与陈宫商议。宫曰:“闻刘玄德新领徐州,可往投之。","page":25}
res: {"id":"0004","score":0.9998,"segment":"富贵功名,前缘分定,为人切莫欺心。","bookName":"三国演义","page":24,"author":"罗贯中"}
---------------------- search ----------------------
15:31:06.277 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/document/search, body={"database":"book","collection":"book_segments","search":{"params":{"ef":100},"filter":"bookName=&#34;三国演义&#34;","retrieveVector":false,"limit":2,"vectors":[[0.2123,0.23,0.213]]},"readConsistency":"eventualConsistency"}
15:31:06.336 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/document/search, code=200, msg=OK, result={"code":0,"msg":"operation success","documents":[[{"id":"0003","score":1.0,"author":"罗贯中","page":23,"bookName":"三国演义","segment":"细作探知这个消息,飞报吕布。"},{"id":"0004","score":0.999787,"author":"罗贯中","bookName":"三国演义","segment":"富贵功名,前缘分定,为人切莫欺心。","page":24}]]}
res: 0
res: {"id":"0003","score":1.0,"author":"罗贯中","page":23,"bookName":"三国演义","segment":"细作探知这个消息,飞报吕布。"}
res: {"id":"0004","score":0.999787,"author":"罗贯中","bookName":"三国演义","segment":"富贵功名,前缘分定,为人切莫欺心。","page":24}
15:31:06.337 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/describe, body={"database":"book","collection":"book_segments"}
15:31:06.396 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/describe, code=200, msg=OK, result={"code":0,"msg":"operation success","collection":{"database":"book","collection":"book_segments","documentCount":5,"replicaNum":0,"shardNum":2,"createTime":"2023-11-23 15:30:56","description":"test collection0","indexes":[{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","indexedCount":5,"dimension":3,"metricType":"COSINE","params":{"M":16,"efConstruction":200}},{"fieldName":"id","fieldType":"string","indexType":"primaryKey"},{"fieldName":"author","fieldType":"string","indexType":"filter"},{"fieldName":"bookName","fieldType":"string","indexType":"filter"}],"indexStatus":{"status":"ready","startTime":""}}}
---------------------- update ----------------------
15:31:06.411 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/document/update, body={"database":"book","collection":"book_segments","query":{"filter":"bookName=&#34;三国演义&#34;","documentIds":["0001","0003"]},"update":{"page":100,"extend":"extendContent"}}
15:31:06.471 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/document/update, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
---------------------- delete ----------------------
15:31:06.478 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/document/delete, body={"database":"book","collection":"book_segments","query":{"filter":"bookName=&#34;西游记&#34;","documentIds":["0001","0003"]}}
15:31:06.549 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/document/delete, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
---------------------- rebuildIndex ----------------------
15:31:06.554 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/index/rebuild, body={"database":"book","collection":"book_segments","dropBeforeRebuild":false,"throttle":1}
15:31:06.627 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/index/rebuild, code=200, msg=OK, result={"code":0,"msg":"operation success"}
---------------------- query ----------------------
15:31:11.637 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/document/query, body={"database":"book","collection":"book_segments","readConsistency":"eventualConsistency","query":{"documentIds":["0001","0002","0003","0004","0005"],"retrieveVector":false,"limit":10,"offset":0,"outputFields":["id","bookName","page","extend"]}}
15:31:11.695 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/document/query, code=200, msg=OK, result={"code":0,"msg":"operation success","count":4,"documents":[{"id":"0002","bookName":"西游记","page":22},{"id":"0003","extend":"extendContent","bookName":"三国演义","page":100},{"id":"0004","page":24,"bookName":"三国演义"},{"id":"0005","page":25,"bookName":"三国演义"}]}
res: {"id":"0002","bookName":"西游记","page":22}
res: {"id":"0003","extend":"extendContent","bookName":"三国演义","page":100}
res: {"id":"0004","page":24,"bookName":"三国演义"}
res: {"id":"0005","page":25,"bookName":"三国演义"}
---------------------- truncate collection ----------------------
15:31:11.697 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/truncate, body={"database":"book","collection":"book_segments"}
15:31:11.788 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/truncate, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
res: AffectRes{affectedCount=1, code=0, msg='operation success'}
---------------------- truncate collection ----------------------
15:31:16.803 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/drop, body={"database":"book","collection":"book_segments"}
15:31:16.874 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/collection/drop, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
---------------------- truncate collection ----------------------
15:31:16.874 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/database/drop, body={"database":"book"}
15:31:16.935 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com/database/drop, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
---------------------- testFilter ----------------------
res: author="jerry" and a=1 or r="or" or not rn=2 and not an="andNot"
res: key in ("v1","v2","v3")
res: key in (1,2,3)

相似度计算

  • 内积(IP)
  • 欧式距离(L2)
  • 余弦相似度(COSINE)

相似性计算方法是向量检索的基础,用于衡量高维向量数据之间的相似度。在创建 Collection 时,需要依据数据特征,选择合适的相似性计算方法。下表展示了这些广泛使用的相似性计算方法如何与各种输入数据形式和腾讯云向量数据库(Tencent Cloud VectorDB)索引相匹配。

相似性计算方法

数据格式

索引类型

内积(IP)

浮点型

FLAT、HNSW、IVF 系列

欧式距离(L2)

余弦相似度(COSINE)

内积(IP)

全称为 Inner Product,内积也称点积,计算结果是一个数。它计算两个向量之间的点积(内积),其计算公式如下所示。其中,a = (a1, a2,..., an) 和 b = (b1, b2,..., bn) ,是 n 维空间中的两个点。计算所得值越大,越与搜索值相似。

添加描述

欧式距离(L2)

欧式距离(L2)全称为 Euclidean distance,指欧几里得距离。它计算两个向量点在空间中的直线距离。计算公式如下所示。其中,a = (a1, a2,..., an) 和 b = (b1, b2,..., bn) 是 n 维空间中的两个点。它是最常用的距离度量。计算所得的值越小,越与搜索值相似。L2在低维空间中表现良好,但是在高维空间中,由于维度灾难的影响,L2的效果会逐渐变差。

添加描述

余弦相似度(COSINE)

余弦相似度(Cosine Similarity)算法,是一种常用的文本相似度计算方法。它通过计算两个向量在多维空间中的夹角余弦值来衡量它们的相似程度。其计算公式如下所示。其中,a = (a1, a2,..., an) 和 b = (b1, b2,..., bn) 是 n 维空间中的两个点。|a|与|b|分别代表 a 和 b 归一化后的值。cosθ 代表 a 与 b 之间的余弦夹角。计算所得值越大,越与搜索值相似。取值范围为-1,1。

说明:

在向量归一化之后,内积与余弦相似度等价。余弦相似性只考虑向量夹角大小,而内积不仅考虑向量夹角大小,也考虑了向量的长度差。

更多细节操作请仔细阅读官方API

API文档 :https://cloud.tencent.com/document/product/1709/95110

我正在参与2023腾讯技术创作特训营第三期有奖征文,组队打卡瓜分大奖!