一、前言
上一篇文章我们学习了ES的基本操作和数据类型,接下来就是ES中比较重要的查询操作了,ES的出现就是为了解决搜索问题,正如他的标语 You Know For Search。当然搜索是一个很复杂的功能,我们也是循序渐进的学习,一开始会是一些比较简单的案例。
二、数据准备
1、创建索引
PUT /hotel
{"mappings": {"properties": {"title":{"type": "text" },"city":{"type": "keyword"},"price":{"type": "double"},"create_time":{"type": "date","format": "yyyy-MM-dd HH:mm:ss"},"amenities":{"type": "text"},"full_room":{"type": "boolean"},"location":{"type": "geo_point"},"praise":{"type": "integer"}}}
说明:我们创建了一个索引,有以下几个字段
字段 | 类型 | 含义 |
title | text | 标题 |
city | keyword | 所在城市 |
price | double | 价格 |
create_time | date | 创建时间 |
amenities | text | 便利设施 |
full_room | boolean | 是否满房 |
location | gen_point | 地理位置(经纬度) |
praise | integer | 好评数量 |
2、写入数据,这里我们使用批量写入数据,使用bulk
POST /_bulk
{"index":{"_index":"hotel","_id":"001"}}
{"title":"文雅酒店","city":"青岛","price":556.00,"create_time":"2020-04-18 12:00:00","amenities":"浴池,普通停车场/充电停车场","full_room":false,"location":{"lat":36.083078,"lon":120.37566},"praise":10}
{"index":{"_index":"hotel","_id":"002"}}
{"title":"金都嘉怡假日酒店","city":"北京","price":337.00,"create_time":"2021-03015 20:00:00","amenities":"wifi,充电停车场/可升降停车场","full_room":false,"location":{"lat":39.915153,"lon":116.4030},"praise":60}
{"index":{"_index":"hotel","_id":"003"}}
{"itle":"金都欣欣酒店","city":"天津","price":200.00,"create_ime":"2021-05-09 16:00:00","amenities":"提供假日party,免费早餐,可充电停车场","full_room":true,"location":{"lat":39.186555,"lon":117.162007},"praise":30}
{"index":{"_index":"hotel","_id":"004"}}
{"title":"金都酒店","city":"北京","price":500.00,"create_time":"2021-02-18 08:00:00","amenities":"浴池(假日需预定),室内游泳池,普通停车场","full_room":true,"location":{"lat":39.915343,"lon":116.4239},"praise":20}
{"index":{"_index":"hotel","_id":"005"}}
{"title":"文雅精选酒店","city":"北京","price":800.00,"create_time":"2021-01-01 08:00:00","amenities":"浴池(假日需预定),wifi,室内游泳池,普通停车场","full_room":true,"location":{"lat":39.918229,"lon":116.422011},"praise":20}
3、我们先查看一下所有的数据
命令:GET /hotel/_search
结果:
#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
{"took" : 852,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 4,"relation" : "eq"},"max_score" : 1.0,"hits" : [{"_index" : "hotel","_type" : "_doc","_id" : "001","_score" : 1.0,"_source" : {"title" : "文雅酒店","city" : "青岛","price" : 556.0,"create_time" : "2020-04-18 12:00:00","amenities" : "浴池,普通停车场/充电停车场","full_room" : false,"location" : {"lat" : 36.083078,"lon" : 120.37566},"praise" : 10}},{"_index" : "hotel","_type" : "_doc","_id" : "003","_score" : 1.0,"_source" : {"itle" : "金都欣欣酒店","city" : "天津","price" : 200.0,"create_ime" : "2021-05-09 16:00:00","amenities" : "提供假日party,免费早餐,可充电停车场","full_room" : true,"location" : {"lat" : 39.186555,"lon" : 117.162007},"praise" : 30}},{"_index" : "hotel","_type" : "_doc","_id" : "004","_score" : 1.0,"_source" : {"title" : "金都酒店","city" : "北京","price" : 500.0,"create_time" : "2021-02-18 08:00:00","amenities" : "浴池(假日需预定),室内游泳池,普通停车场","full_room" : true,"location" : {"lat" : 39.915343,"lon" : 116.4239},"praise" : 20}},{"_index" : "hotel","_type" : "_doc","_id" : "005","_score" : 1.0,"_source" : {"title" : "文雅精选酒店","city" : "北京","price" : 800.0,"create_time" : "2021-01-01 08:00:00","amenities" : "浴池(假日需预定),wifi,室内游泳池,普通停车场","full_room" : true,"location" : {"lat" : 39.918229,"lon" : 116.422011},"praise" : 20}}]}
}
三、开始查询
1、返回指定字段
很多场景我们并不需要返回所有的字段,比如在列表页的时候我们可能只需要返回 标题、价格而不需要把所的字段都查询出来,当然你也可以这么做,只不过这么做会带来一些性能上的损耗,这部分损耗包含从ES查询、从ES返回到客户端,从后端返回给前端等等,所以我们这里做指定字段的查询。使用到的关键字是_source
命令:
GET /hotel/_search
{"_source": ["title","price"]
}结果:
#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 4,"relation" : "eq"},"max_score" : 1.0,"hits" : [{"_index" : "hotel","_type" : "_doc","_id" : "001","_score" : 1.0,"_source" : {"price" : 556.0,"title" : "文雅酒店"}},{"_index" : "hotel","_type" : "_doc","_id" : "003","_score" : 1.0,"_source" : {"price" : 200.0}},{"_index" : "hotel","_type" : "_doc","_id" : "004","_score" : 1.0,"_source" : {"price" : 500.0,"title" : "金都酒店"}},{"_index" : "hotel","_type" : "_doc","_id" : "005","_score" : 1.0,"_source" : {"price" : 800.0,"title" : "文雅精选酒店"}}]}
}可以看到ES值返回了价格和标题,其余字段并没有返回。这个搜索就等价于Mysql中的
Select title,price from hotel
2、计数查询
有些场景我们只想知道有多少符合条件的数据,而不需要知道确定的数据是什么,此时就可以用ES提供的计数查询来做。例如我们想知道城市为北京的数据有多少条。
GET /hotel/_count
{"query": {"term": {"city": {"value": "北京"}}}
}结果:
{"count" : 2,//下面这个是分片信息可以暂时忽略"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0}
}可以看到结果只返回了满足条件的数据的数量,而并没有返回具体是那两条数据。这里用了query 和 term
这些后面也会讲到。这个搜索等价于Mysql中的
SELECT count(*) FROM hotel WHERE city = '北京'
3、分页查询
分页查询,这个也很好理解,我们不可能一次性把所有数据都给客户,一方面是性能会很差,另一方面客户或许根本不关心这么多数据,所以此时我们需要将查询结果分页,让用户有选择的查看数据,话不多说直接上代码
GET /hotel/_search
{"from": 0,"size": 1,"query": {"term": {"city": {"value": "北京"}}}
}
结果:
#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 0.87546873,"hits" : [{"_index" : "hotel","_type" : "_doc","_id" : "004","_score" : 0.87546873,"_source" : {"title" : "金都酒店","city" : "北京","price" : 500.0,"create_time" : "2021-02-18 08:00:00","amenities" : "浴池(假日需预定),室内游泳池,普通停车场","full_room" : true,"location" : {"lat" : 39.915343,"lon" : 116.4239},"praise" : 20}}]}
}默认ES是只返回10条数据,我们可以通过from和size来设置分页参数,当ES默认最大返回值是
1000。当然这个数字是可以改的,在创建索引或者修改索引时设置settings index中有一个
max_result_window属性PUT /hotel/_settings
{"index":{"max_result_window":2000}
}这个搜索相当于Mysql中的
SELECT * FROM hotel WHERE city = '北京' limit 0,1
留个坑:ES会有深分页的问题,这个问题会放到后续的文章中讲述!!
4、性能分析
虽然ES是一个很强的搜索引擎,但是如果DSL写的过于抽象(过于烂)或者说索引设计的不合理也会导致ES搜索变慢,那么该如何解决呢?那肯定是要先知道那慢了,此时我们就可以用ES给我们提供的性能分析命令。我们还是以上面的DSL作为例子
DSL:
GET /hotel/_search
{"profile": true,"query": {"match": {"title": "金都"}}
}结果:
{"took" : 3,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 2.0834165,"hits" : [{"_index" : "hotel","_type" : "_doc","_id" : "004","_score" : 2.0834165,"_source" : {"title" : "金都酒店","city" : "北京","price" : 500.0,"create_time" : "2021-02-18 08:00:00","amenities" : "浴池(假日需预定),室内游泳池,普通停车场","full_room" : true,"location" : {"lat" : 39.915343,"lon" : 116.4239},"praise" : 20}}]},"profile" : {"shards" : [{"id" : "[dglsus9vTpGyHUlIqpKvlw][hotel][0]","searches" : [{"query" : [{"type" : "BooleanQuery","description" : "title:金 title:都","time_in_nanos" : 2043800,"breakdown" : {"set_min_competitive_score_count" : 0,"match_count" : 1,"shallow_advance_count" : 0,"set_min_competitive_score" : 0,"next_doc" : 6700,"match" : 4000,"next_doc_count" : 1,"score_count" : 1,"compute_max_score_count" : 0,"compute_max_score" : 0,"advance" : 77200,"advance_count" : 1,"score" : 13300,"build_scorer_count" : 2,"create_weight" : 139400,"shallow_advance" : 0,"create_weight_count" : 1,"build_scorer" : 1803200},"children" : [{"type" : "TermQuery","description" : "title:金","time_in_nanos" : 187000,"breakdown" : {"set_min_competitive_score_count" : 0,"match_count" : 0,"shallow_advance_count" : 3,"set_min_competitive_score" : 0,"next_doc" : 0,"match" : 0,"next_doc_count" : 0,"score_count" : 1,"compute_max_score_count" : 3,"compute_max_score" : 36400,"advance" : 1100,"advance_count" : 2,"score" : 7400,"build_scorer_count" : 3,"create_weight" : 37800,"shallow_advance" : 12500,"create_weight_count" : 1,"build_scorer" : 91800}},{"type" : "TermQuery","description" : "title:都","time_in_nanos" : 43700,"breakdown" : {"set_min_competitive_score_count" : 0,"match_count" : 0,"shallow_advance_count" : 3,"set_min_competitive_score" : 0,"next_doc" : 0,"match" : 0,"next_doc_count" : 0,"score_count" : 1,"compute_max_score_count" : 3,"compute_max_score" : 5400,"advance" : 1800,"advance_count" : 2,"score" : 1000,"build_scorer_count" : 3,"create_weight" : 8400,"shallow_advance" : 2400,"create_weight_count" : 1,"build_scorer" : 24700}}]}],"rewrite_time" : 7200,"collector" : [{"name" : "SimpleTopScoreDocCollector","reason" : "search_top_hits","time_in_nanos" : 23900}]}],"aggregations" : [ ],"fetch" : {"type" : "fetch","description" : "","time_in_nanos" : 102400,"breakdown" : {"load_stored_fields" : 19700,"load_stored_fields_count" : 1,"next_reader" : 9000,"next_reader_count" : 1},"debug" : {"stored_fields" : ["_id","_routing","_source"]},"children" : [{"type" : "FetchSourcePhase","description" : "","time_in_nanos" : 4600,"breakdown" : {"process_count" : 1,"process" : 3900,"next_reader" : 700,"next_reader_count" : 1},"debug" : {"fast_path" : 1}}]}}]}
}
这里的结果分析还是很冗长的,读起来也是有一点的难度的。索性我们可以借助kibana来帮我们分析
实话实说,笔者学到这里的时候还是很会看这个,后续再补充!
5、评分分析
ES是会对搜索条件进行评分的,如果用户没有指定按照那个字段进行排序,ES会使用自己的打分算法对文档进行排序,当然也可以人为干预,百度就是这么做的。有时候我们需要知道某个文档的具体打分详情,此时可以使用ES提供的explain来查询,例如
GET /hotel/_explain/002
{"query":{"match":{"title":"金都"}}
}结果:
#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
{"_index" : "hotel","_type" : "_doc","_id" : "002","matched" : true,"explanation" : {"value" : 1.1689311,"description" : "sum of:","details" : [{"value" : 0.58446556,"description" : "weight(title:金 in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.58446556,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.6931472,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 2,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.38327527,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 8.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]},{"value" : 0.58446556,"description" : "weight(title:都 in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.58446556,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.6931472,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 2,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.38327527,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 8.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]}]}
}这个过于复杂,现在了解即可
四、结束语
今天学习了ES中的查询,当然受限于篇幅,只是一部分。后续还会有更多的复杂搜索,希望对你有所帮助。