你还不会ES的CUD吗?

微信扫一扫,分享到朋友圈

你还不会ES的CUD吗?

近端时间在搬砖过程中对es进行了操作,但是对es查询文档不熟悉,所以这两周都在研究es,简略看了《Elasticsearch权威指南》,摸摸鱼又是一天。

es是一款基于Lucene的实时分布式搜索和分析引擎,今天咱不聊其应用场景,聊一下es索引增删改。

环境:Centos 7,Elasticsearch6.8.3,jdk8

(最新的es是7版本,7版本需要jdk11以上,所以装了es6.8.3版本。)

下面都将以student索引为例

一、创建索引

PUT   http://192.168.197.100:9200/student
{
"mapping":{
"_doc":{ //“_doc”是类型type,es6中一个索引下只有一个type,不能有其它type
"properties":{
"id": {
"type": "keyword"
},
"name":{
"type":"text",
"index":"analyzed",
"analyzer":"standard"
},
"age":{
"type":"integer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above":256
}
}
},
"birthday":{
"type":"date"
},
"gender":{
"type":"keyword"
},
"grade":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"class":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
}
}
},
"settings":{
//主分片数量
"number_of_shards" : 1,
//分片副本数量
"number_of_replicas" : 1
}
}

type属性是text和keyword的区别:

(1)text在查询的时候会被分词,用于搜索

(2)keyword在查询的时候不会被分词,用于聚合

index属性是表示字符串以何种方式被索引,有三种值

(1)analyzed:字段可以被模糊匹配,类似于sql中的like

(2)not_analyzed:字段只能精确匹配,类似于sql中的“=”

(3)no:字段不提供搜索

analyzer属性是设置分词器,中文的话一般是ik分词器,也可以自定义分词器。

number_of_shards属性是主分片数量,默认是5,创建之后不能修改

number_of_replicas属性时分片副本数量,默认是1,可以修改

创建成功之后会返回如下json字符串

{    "acknowledged": true,    "shards_acknowledged": true,    "index": "student"}

创建之后如何查看索引的详细信息呢?

GET http://192.168.197.100:9200/student/_mapping

es6版本,索引之下只能有一个类型,例如上文中的“_doc”。

es跟关系型数据库比较:

二、修改索引

//修改分片副本数量为2
PUT http://192.168.197.100:9200/student/_settings
{
"number_of_replicas":2
}

三、删除索引

//删除单个索引
DELETE http://192.168.197.100:9200/student
//删除所有索引
DELETE  http://192.168.197.100:9200/_all

四、默认分词器standard和ik分词器比较

es默认的分词器是standard,它对英文的分词是以空格分割的,中文则是将一个词分成一个一个的文字,所以其不适合作为中文分词器。

例如:standard对英文的分词

//此api是查看文本分词情况的
POST http://192.168.197.100:9200/_analyze
{
"text":"the People's Republic of China",
"analyzer":"standard"
}

结果如下:

{
"tokens": [
{
"token": "the",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "people's",
"start_offset": 4,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "republic",
"start_offset": 13,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "of",
"start_offset": 22,
"end_offset": 24,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "china",
"start_offset": 25,
"end_offset": 30,
"type": "<ALPHANUM>",
"position": 4
}
]
}

对中文的分词:

POST http://192.168.197.100:9200/_analyze
{
"text":"中华人民共和国万岁",
"analyzer":"standard"
}

结果如下:

{
"tokens": [
{
"token": "中",
"start_offset": 0,
"end_offset": 1,
"type": "<IDEOGRAPHIC>",
"position": 0
},
{
"token": "华",
"start_offset": 1,
"end_offset": 2,
"type": "<IDEOGRAPHIC>",
"position": 1
},
{
"token": "人",
"start_offset": 2,
"end_offset": 3,
"type": "<IDEOGRAPHIC>",
"position": 2
},
{
"token": "民",
"start_offset": 3,
"end_offset": 4,
"type": "<IDEOGRAPHIC>",
"position": 3
},
{
"token": "共",
"start_offset": 4,
"end_offset": 5,
"type": "<IDEOGRAPHIC>",
"position": 4
},
{
"token": "和",
"start_offset": 5,
"end_offset": 6,
"type": "<IDEOGRAPHIC>",
"position": 5
},
{
"token": "国",
"start_offset": 6,
"end_offset": 7,
"type": "<IDEOGRAPHIC>",
"position": 6
},
{
"token": "万",
"start_offset": 7,
"end_offset": 8,
"type": "<IDEOGRAPHIC>",
"position": 7
},
{
"token": "岁",
"start_offset": 8,
"end_offset": 9,
"type": "<IDEOGRAPHIC>",
"position": 8
}
]
}

ik分词器是支持对中文进行词语分割的,其有两个分词器,分别是ik_smart和ik_max_word。

(1)ik_smart:对中文进行最大粒度的划分,简略划分

例如:

POST http://192.168.197.100:9200/_analyze
{
"text":"中华人民共和国万岁",
"analyzer":"ik_smart"
}

结果如下:

{
"tokens": [
{
"token": "中华人民共和国",
"start_offset": 0,
"end_offset": 7,
"type": "CN_WORD",
"position": 0
},
{
"token": "万岁",
"start_offset": 7,
"end_offset": 9,
"type": "CN_WORD",
"position": 1
}
]
}

(2)ik_max_word:对中文进行最小粒度的划分,将文本划分尽量多的词语

例如:

POST http://192.168.197.100:9200/_analyze
{
"text":"中华人民共和国万岁",
"analyzer":"ik_max_word"
}

结果如下:

{
"tokens": [
{
"token": "中华人民共和国",
"start_offset": 0,
"end_offset": 7,
"type": "CN_WORD",
"position": 0
},
{
"token": "中华人民",
"start_offset": 0,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
},
{
"token": "中华",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 2
},
{
"token": "华人",
"start_offset": 1,
"end_offset": 3,
"type": "CN_WORD",
"position": 3
},
{
"token": "人民共和国",
"start_offset": 2,
"end_offset": 7,
"type": "CN_WORD",
"position": 4
},
{
"token": "人民",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 5
},
{
"token": "共和国",
"start_offset": 4,
"end_offset": 7,
"type": "CN_WORD",
"position": 6
},
{
"token": "共和",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 7
},
{
"token": "国",
"start_offset": 6,
"end_offset": 7,
"type": "CN_CHAR",
"position": 8
},
{
"token": "万岁",
"start_offset": 7,
"end_offset": 9,
"type": "CN_WORD",
"position": 9
},
{
"token": "万",
"start_offset": 7,
"end_offset": 8,
"type": "TYPE_CNUM",
"position": 10
},
{
"token": "岁",
"start_offset": 8,
"end_offset": 9,
"type": "COUNT",
"position": 11
}
]
}

ik分词器对英文的分词:

POST http://192.168.197.100:9200/_analyze
{
"text":"the People's Republic of China",
"analyzer":"ik_smart"
}

结果如下:会将不重要的词去掉,但standard分词器会保留(英语水平已经退化到a an the都不知道是属于什么类型的词了,身为中国人,这个不能骄傲)

{
"tokens": [
{
"token": "people",
"start_offset": 4,
"end_offset": 10,
"type": "ENGLISH",
"position": 0
},
{
"token": "s",
"start_offset": 11,
"end_offset": 12,
"type": "ENGLISH",
"position": 1
},
{
"token": "republic",
"start_offset": 13,
"end_offset": 21,
"type": "ENGLISH",
"position": 2
},
{
"token": "china",
"start_offset": 25,
"end_offset": 30,
"type": "ENGLISH",
"position": 3
}
]
}

五、添加文档

可以任意添加字段

//1是“_id”的值,唯一的,也可以随机生成
POST http://192.168.197.100:9200/student/_doc/1
{
"id":1,
"name":"tom",
"age":20,
"gender":"male",
"grade":"7",
"class":"1"
}

六、更新文档

POST http://192.168.197.100:9200/student/_doc/1/_update
{
"doc":{
"name":"jack"
}
}

七、删除文档

//1是“_id”的值
DELETE http://192.168.197.100:9200/student/_doc/1

上述就是简略的对es进行索引创建,修改,删除,文档添加,删除,修改等操作,为避免篇幅太长,文档查询操作将在下篇进行更新。

微信扫一扫,分享到朋友圈

你还不会ES的CUD吗?

120亿:京东方收购中电熊猫8.5代/8.6代液晶生产线

上一篇

5折促销 百度文库vip年卡+百度网盘超级会员年卡仅需240元

下一篇

你也可能喜欢

你还不会ES的CUD吗?

长按储存图像,分享给朋友