【Elasticsearch】文档操作：添加、更新和删除

🧑 博主简介：CSDN博客专家，历代文学网（PC端可以访问：https://literature.sinhy.com/#/?__c=1000，移动端可微信小程序搜索“历代文学”）总架构师，15年工作经验，精通Java编程，高并发设计，Springboot和微服务，熟悉Linux，ESXI虚拟化以及云原生Docker和K8s，热衷于探索科技的边界，并将理论知识转化为实际应用。保持对新技术的好奇心，乐于分享所学，希望通过我的实践经历和见解，启发他人的创新思维。在这里，我希望能与志同道合的朋友交流探讨，共同进步，一起在技术的世界里不断学习成长。
技术合作请加本人wx（注明来自csdn）：foreast_sea

在这里插入图片描述

【Elasticsearch】文档操作：添加、更新和删除

引言

在当今数据爆炸的时代，高效地存储、检索和管理数据成为了众多应用程序面临的关键挑战。Elasticsearch作为一款强大的开源分布式搜索引擎和数据分析引擎，凭借其卓越的性能和灵活的功能，在数据处理领域占据了重要地位。其中，对文档的操作是Elasticsearch的核心功能之一，包括添加、更新和删除文档等操作。

添加文档是将数据存入Elasticsearch的第一步，我们需要考虑如何为文档指定唯一标识，或者让系统自动生成，同时还要处理可能出现的版本冲突问题。更新文档则涉及到对已有数据的修改，可能是部分字段的更新，也可能是全量更新，了解其背后的原理和实现机制对于正确操作至关重要。而删除文档则要确保数据的准确性和一致性，根据不同的条件准确地删除指定的文档。

在前面，我们详细介绍了Elasticsearch的查询方法：【Elasticsearch】八种Query搜索类型详解。

在本文，我们将深入探讨Elasticsearch中这些文档操作（增、删、改）的具体方法和技巧，通过实际的代码示例和详细的原理讲解，帮助读者全面掌握Elasticsearch文档操作的精髓，为高效处理数据提供有力支持。

1. Elasticsearch简介及相关Maven依赖

1.1 Elasticsearch简介

Elasticsearch是一个基于Lucene库的分布式、RESTful风格的搜索和数据分析引擎。它具有高度可扩展性、实时性和强大的搜索功能，能够快速地存储、检索和分析海量数据。其核心概念包括索引（Index）、文档（Document）和映射（Mapping）等。

索引（Index）：类似于关系型数据库中的数据库，是具有相似特征的文档的集合。
文档（Document）：是Elasticsearch中的基本数据单元，以JSON格式存储，可以包含多个字段。
映射（Mapping）：定义了文档中各个字段的类型、格式和属性等信息，类似于关系型数据库中的表结构。

1.2相关Maven依赖

在Java项目中使用Elasticsearch，需要添加相应的Maven依赖。以下是常用的依赖配置：

<dependency><groupId>org.elasticsearch.client</groupId><artifactId>elasticsearch-rest-high-level-client</artifactId><version>7.17.3</version>
</dependency>
<dependency><groupId>org.elasticsearch</groupId><artifactId>elasticsearch</artifactId><version>7.17.3</version>
</dependency>

这里我们使用了elasticsearch-rest-high-level-client，它提供了更高级、更方便的API来与Elasticsearch进行交互。elasticsearch依赖则包含了Elasticsearch的核心功能和数据结构。

在项目中引入这些依赖后，就可以开始编写代码来操作Elasticsearch中的文档了。接下来，我们将分别介绍添加文档、更新文档和删除文档的具体操作方法。

2. 添加文档

2.1指定文档的唯一标识添加文档

在Elasticsearch中，每个文档都可以有一个唯一的标识，称为_id。我们可以在添加文档时指定这个_id，这样可以方便地对文档进行后续的操作，如更新和删除。

以下是一个使用Java代码通过指定_id添加文档的示例：

import org.apache.http.HttpHost;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;import java.io.IOException;public class AddDocumentWithIdExample {public static void main(String[] args) throws IOException {// 创建Elasticsearch客户端RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http")));// 定义文档数据String jsonString = "{\"title\":\"Elasticsearch入门\", \"content\":\"这是一篇关于Elasticsearch的文章\"}";// 创建索引请求对象，并指定索引名、文档唯一标识和文档数据IndexRequest request = new IndexRequest("my_index").id("1").source(jsonString, XContentType.JSON);// 执行索引请求IndexResponse response = client.index(request, RequestOptions.DEFAULT);// 输出结果System.out.println("索引名称：" + response.getIndex());System.out.println("文档唯一标识：" + response.getId());System.out.println("版本号：" + response.getVersion());// 关闭客户端client.close();}
}

在上述代码中，我们首先创建了一个RestHighLevelClient对象，用于与Elasticsearch进行交互。然后，定义了一个JSON格式的文档数据，并创建了一个IndexRequest对象，指定了索引名、文档的唯一标识_id和文档数据。最后，通过client.index()方法执行索引请求，并输出结果。

2.2由系统自动生成文档唯一标识添加文档

如果我们不想手动指定文档的_id，可以让Elasticsearch系统自动生成。在这种情况下，只需要在创建IndexRequest对象时不调用id()方法即可。

以下是一个使用Java代码由系统自动生成_id添加文档的示例：

import org.apache.http.HttpHost;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;import java.io.IOException;public class AddDocumentAutoIdExample {public static void main(String[] args) throws IOException {// 创建Elasticsearch客户端RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http")));// 定义文档数据String jsonString = "{\"title\":\"Elasticsearch进阶\", \"content\":\"深入学习Elasticsearch的高级特性\"}";// 创建索引请求对象，并指定索引名和文档数据IndexRequest request = new IndexRequest("my_index").source(jsonString, XContentType.JSON);// 执行索引请求IndexResponse response = client.index(request, RequestOptions.DEFAULT);// 输出结果System.out.println("索引名称：" + response.getIndex());System.out.println("文档唯一标识：" + response.getId());System.out.println("版本号：" + response.getVersion());// 关闭客户端client.close();}
}

在这个示例中，我们没有调用id()方法来指定文档的_id，Elasticsearch会自动为文档生成一个唯一的标识。

2.3处理文档的版本冲突

在分布式系统中，多个客户端可能同时对同一个文档进行操作，这就可能导致版本冲突的问题。Elasticsearch通过乐观并发控制（Optimistic Concurrency Control）来处理版本冲突。

每个文档都有一个版本号（_version），当对文档进行更新或删除操作时，需要指定当前文档的版本号。如果指定的版本号与Elasticsearch中存储的文档版本号不一致，操作将失败。

以下是一个使用Java代码处理文档版本冲突的示例：

import org.apache.http.HttpHost;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.rest.RestStatus;import java.io.IOException;public class HandleVersionConflictExample {public static void main(String[] args) throws IOException {// 创建Elasticsearch客户端RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http")));// 定义文档数据String jsonString = "{\"title\":\"Elasticsearch更新\", \"content\":\"更新Elasticsearch文档内容\"}";// 创建索引请求对象，并指定索引名、文档唯一标识、版本号和文档数据IndexRequest request = new IndexRequest("my_index").id("1").version(1).source(jsonString, XContentType.JSON);try {// 执行索引请求IndexResponse response = client.index(request, RequestOptions.DEFAULT);// 输出结果System.out.println("索引名称：" + response.getIndex());System.out.println("文档唯一标识：" + response.getId());System.out.println("版本号：" + response.getVersion());} catch (Exception e) {if (e.getMessage().contains("version_conflict_engine_exception")) {System.out.println("发生版本冲突，请检查文档版本号。");} else {e.printStackTrace();}}// 关闭客户端client.close();}
}

在上述代码中，我们通过version()方法指定了文档的版本号。如果在执行索引请求时，Elasticsearch中存储的文档版本号与指定的版本号不一致，将抛出version_conflict_engine_exception异常，我们可以根据异常信息进行相应的处理。

3. 更新文档

3.1部分字段更新

在实际应用中，我们可能只需要更新文档中的部分字段，而不是整个文档。Elasticsearch提供了UpdateRequest来实现部分字段的更新。

以下是一个使用Java代码对文档进行部分字段更新的示例：

import org.apache.http.HttpHost;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;import java.io.IOException;
import java.util.HashMap;
import java.util.Map;public class PartialUpdateDocumentExample {public static void main(String[] args) throws IOException {// 创建Elasticsearch客户端RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http")));// 定义要更新的字段和值Map<String, Object> updateFields = new HashMap<>();updateFields.put("content", "这是更新后的文章内容");// 创建更新请求对象，并指定索引名、文档唯一标识和要更新的字段UpdateRequest request = new UpdateRequest("my_index", "1").doc(updateFields);// 执行更新请求UpdateResponse response = client.update(request, RequestOptions.DEFAULT);// 输出结果System.out.println("索引名称：" + response.getIndex());System.out.println("文档唯一标识：" + response.getId());System.out.println("版本号：" + response.getVersion());// 关闭客户端client.close();}
}

在上述代码中，我们首先创建了一个Map对象，用于存储要更新的字段和值。然后，创建了一个UpdateRequest对象，指定了索引名、文档的唯一标识和要更新的字段。最后，通过client.update()方法执行更新请求，并输出结果。

3.2全量更新

如果需要对文档进行全量更新，可以使用与添加文档类似的方法，只是在创建IndexRequest对象时指定文档的_id，这样就会覆盖原来的文档内容。

以下是一个使用Java代码对文档进行全量更新的示例：

import org.apache.http.HttpHost;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;import java.io.IOException;public class FullUpdateDocumentExample {public static void main(String[] args) throws IOException {// 创建Elasticsearch客户端RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http")));// 定义新的文档数据String jsonString = "{\"title\":\"Elasticsearch全量更新\", \"content\":\"全新的Elasticsearch文档内容\"}";// 创建索引请求对象，并指定索引名、文档唯一标识和新的文档数据IndexRequest request = new IndexRequest("my_index").id("1").source(jsonString, XContentType.JSON);// 执行索引请求IndexResponse response = client.index(request, RequestOptions.DEFAULT);// 输出结果System.out.println("索引名称：" + response.getIndex());System.out.println("文档唯一标识：" + response.getId());System.out.println("版本号：" + response.getVersion());// 关闭客户端client.close();}
}

在这个示例中，我们创建了一个新的IndexRequest对象，指定了索引名、文档的_id和新的文档数据。然后，通过client.index()方法执行索引请求，实现了对文档的全量更新。

3.3更新操作背后的原理与实现机制

Elasticsearch的更新操作实际上是先从索引中获取原始文档，然后在内存中对文档进行修改，最后将修改后的文档重新索引到Elasticsearch中。

当执行部分字段更新时，Elasticsearch会根据UpdateRequest中指定的字段和值，在内存中修改原始文档的相应字段。然后，将修改后的文档重新索引到索引中，更新文档的版本号。

对于全量更新，Elasticsearch会直接使用新的文档数据覆盖原来的文档内容。同样，在更新完成后，会更新文档的版本号。

在更新过程中，Elasticsearch会通过乐观并发控制来处理版本冲突。如果在获取原始文档后，其他客户端对文档进行了修改，导致版本号不一致，更新操作将失败。

4. 删除文档

4.1根据文档的唯一标识删除文档

根据文档的_id删除文档是最常见的删除方式。Elasticsearch提供了DeleteRequest来实现根据_id删除文档的功能。

以下是一个使用Java代码根据文档的_id删除文档的示例：

import org.apache.http.HttpHost;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;import java.io.IOException;public class DeleteDocumentByIdExample {public static void main(String[] args) throws IOException {// 创建Elasticsearch客户端RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http")));// 创建删除请求对象，并指定索引名和文档唯一标识DeleteRequest request = new DeleteRequest("my_index", "1");// 执行删除请求DeleteResponse response = client.delete(request, RequestOptions.DEFAULT);// 输出结果System.out.println("索引名称：" + response.getIndex());System.out.println("文档唯一标识：" + response.getId());System.out.println("版本号：" + response.getVersion());// 关闭客户端client.close();}
}

在上述代码中，我们创建了一个DeleteRequest对象，指定了索引名和文档的_id。然后，通过client.delete()方法执行删除请求，并输出结果。

4.2根据查询条件删除文档

除了根据_id删除文档，我们还可以根据查询条件删除符合条件的多个文档。Elasticsearch提供了DeleteByQueryRequest来实现根据查询条件删除文档的功能。

以下是一个使用Java代码根据查询条件删除文档的示例：

import org.apache.http.HttpHost;
import org.elasticsearch.action.delete.DeleteByQueryRequest;
import org.elasticsearch.action.delete.DeleteByQueryResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;import java.io.IOException;public class DeleteDocumentByQueryExample {public static void main(String[] args) throws IOException {// 创建Elasticsearch客户端RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http")));// 创建删除请求对象，并指定索引名和查询条件DeleteByQueryRequest request = new DeleteByQueryRequest("my_index");request.setQuery(QueryBuilders.matchQuery("title", "Elasticsearch"));// 执行删除请求DeleteByQueryResponse response = client.deleteByQuery(request, RequestOptions.DEFAULT);// 输出结果System.out.println("删除的文档数量：" + response.getDeleted());// 关闭客户端client.close();}
}

在这个示例中，我们创建了一个DeleteByQueryRequest对象，指定了索引名和查询条件。这里使用了QueryBuilders.matchQuery()方法来构建一个匹配查询条件，查询title字段中包含Elasticsearch的文档。然后，通过client.deleteByQuery()方法执行删除请求，并输出删除的文档数量。

5. 总结

本文详细介绍了Elasticsearch中文档的添加、更新和删除操作。在添加文档时，我们可以指定文档的唯一标识或由系统自动生成，并通过乐观并发控制来处理版本冲突。更新文档可以分为部分字段更新和全量更新，其背后的原理是先获取原始文档，在内存中修改后重新索引。删除文档可以根据文档的唯一标识删除，也可以根据查询条件进行批量删除。