Elasticsearch的安装及基本使用

简介

Elasticsearch是一个高度可扩展的全文搜索和分析引擎,基于Apache Lucence(事实上,Lucence也是百度所采用的搜索引擎)构建,能够对大容量的数据进行接近实时的存储、搜索和分析操作。

安装

安装Java8

在安装Elasticsearch前,首先需要安装JAVA环境。
可以先执行以下命令查看是否安装:

java -version

如果没有安装需要先到oracle官网下载jdk-6u45-windows-x64.exe并安装。下载时会提示需要登录账号才能下载。

配置Java环境变量

在系统环境变量中,新建JAVA_HOME变量,值为:C:\Program Files\Java\jdk1.6.0_45

在path这个系统环境变量中,添加:;%JAVA_HOME%\bin

添加CLASSPATH变量,值为:.;%JAVA_HOME%\lib\dt.jar;%JAVA_HOME%\lib\tools.jar;

打开cmd命令行,输入javac -version,如果成功执行命令,则说明jdk安装成功。

安装Elasticsearch

下载地址:https://www.elastic.co/cn/downloads/elasticsearch

目前最新版是8.4.1

下载后解压压缩包,在config/elasticsearch.yml末尾添加:

ingest.geoip.downloader.enabled: false

然后修改如下配置:

network.host: 127.0.0.1
http.port: 9200

双击bin/elasticsearch.bat运行后,报错,再次打开elasticsearch.yml,修改:

xpack.security.enabled: false

双击elasticsearch.bat运行,浏览器访问127.0.0.1:9200,看到以下界面则说明安装成功:

基本使用

创建索引

PUT http://127.0.0.1:9200/<index>

<index>:索引的名称
注:也可以不创建索引,后续在添加文档时,如果没有索引会自动创建。

删除索引

DELETE http://127.0.0.1:9200/<index>

添加/更新文档

POST http://127.0.0.1:9200/<index>/_doc/<_id>

<index>:索引的名称
<_id>:文档的唯一标识符

例如:

POST http://127.0.0.1:9200/movies/_doc/1
{"id":1,"title":"Kung Fu Panda","overview":"When the Valley of Peace is threatened, lazy Po the panda discovers his destiny as the \"chosen one\" and trains to become a kung fu hero, but transforming the unsleek slacker into a brave warrior won't be easy. It's up to Master Shifu and the Furious Five -- Tigress, Crane, Mantis, Viper and Monkey -- to give it a try.","genres":["Action","Adventure","Animation","Family","Comedy"],"poster":"https://image.tmdb.org/t/p/w500/wWt4JYXTg5Wr3xBW2phBrMKgp3x.jpg","release_date":1212537600}

可以使用Postman来操作:

批量添加文档

PUT http://127.0.0.1:9200/<index>/_bulk

例如:

PUT http://127.0.0.1:9200/movies/_bulk
{"index":{"_id":1}}
{"id":1,"title":"Kung Fu Panda","overview":"When the Valley of Peace is threatened, lazy Po the panda discovers his destiny as the \"chosen one\" and trains to become a kung fu hero, but transforming the unsleek slacker into a brave warrior won't be easy. It's up to Master Shifu and the Furious Five -- Tigress, Crane, Mantis, Viper and Monkey -- to give it a try.","genres":["Action","Adventure","Animation","Family","Comedy"],"poster":"https://image.tmdb.org/t/p/w500/wWt4JYXTg5Wr3xBW2phBrMKgp3x.jpg","release_date":1212537600}
{"index":{"_id":2}}
{"id":2,"title":"Batman","overview":"Batman has not been seen for ten years. A new breed of criminal ravages Gotham City, forcing 55-year-old Bruce Wayne back into the cape and cowl. But, does he still have what it takes to fight crime in a new era?","genres":["Action","Animation","Mystery"],"poster":"https://image.tmdb.org/t/p/w500/kkjTbwV1Xnj8wBL52PjOcXzTbnb.jpg","release_date":1345507200}

注:提交数据的最后一行有一行空行\n不能省略。

获取文档

GET http://127.0.0.1:9200/<index>/_doc/<_id>

例如:

GET http://127.0.0.1:9200/movies/_doc/1

删除文档

DELETE http://127.0.0.1:9200/<index>/_doc/<_id>

搜索文档

URL参数查询

GET http://127.0.0.1:9200/<index>/_search?q=<keyword>&sort=<field>:<direction>

q:使用q参数来运行查询参数搜索
<keyword>:查询字符串
sort:排序

例如:

GET http://127.0.0.1:9200/movies/_search?q=panda&sort=id:asc

url中的更多参数请查看官方文档

响应正文

搜素后得到如下响应正文:

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": null,
        "hits": [
            {
                "_index": "movies",
                "_id": "1",
                "_score": null,
                "_ignored": [
                    "overview.keyword"
                ],
                "_source": {
                    "id": 1,
                    "title": "Kung Fu Panda",
                    "overview": "When the Valley of Peace...",
                    "genres": [
                        "Action",
                        ...
                    ],
                    "poster": "https://image.xxx.jpg",
                    "release_date": 1212537600
                },
                "sort": [
                    1
                ]
            }
        ]
    }
}
  • took – 执行搜索的时间(以毫秒为单位)
  • timed_out – 搜索是否超时
  • _shards – 搜索了多少个分片,以及搜索成功/失败的分片数
  • hits – 搜索结果
  • hits.total – 符合搜索条件的文档总数
  • hits.hits – 搜索结果数组(默认为前10个文档)
  • hits.sort – 结果的排序键
  • hits._score – 文档的相关性,数字越高,文档越相关。

更多参数解释可以查看官方文档

DSL查询

Elasticsearch 提供了基于 JSON 的完整 Query DSL(Domain Specific Language)来定义查询。

  • term查询
    查询ID为1的文档:
GET http://127.0.0.1:9200/movies/_search
{
    "query": {
        "term": {
            "id": 1
        }
    }
}

一般使用term检索非文本的精确值,例如商品价格、商品ID、登录账号等。

  • terms查询
    相当于多个term检索, 类似于SQL中in关键字的用法, 即在某些给定的数据中检索:
GET http://127.0.0.1:9200/movies/_search
{
    "query": {
        "terms": {
            "title.keyword": [
                "Kung Fu Panda", "Batman"
            ]
        }
    }
}

一般使用term检索非文本的精确值,例如商品价格、商品ID、登录账号等。

  • match文本模糊查询
    查询标题包含panda的文档:
GET http://127.0.0.1:9200/movies/_search
{
    "query": {
        "match": {
            "title": "panda"
        }
    }
}
  • keyword匹配精确值
    使用keyword后,文本精确值必须是匹配值才算匹配成功
GET http://127.0.0.1:9200/movies/_search
{
    "query": {
        "match": {
            "title.keyword": "Kung Fu Panda"
        }
    }
}
  • match_phrase短语匹配
    使用match时会自动分词,如果不想分词,想查询一个完整的短语就可以使用短语匹配
GET http://127.0.0.1:9200/movies/_search
{
    "query": {
        "match_phrase": {
            "title": "Kung Fu"
        }
    }
}
  • multi_match多字段匹配
    在title和overview两个字段中匹配关键词
GET http://127.0.0.1:9200/movies/_search
{
    "query": {
        "multi_match": {
            "query": "Kung Fu",
            "fields": ["title","overview"]
        }
    }
}

注: 查询内容会分词

  • filer过滤查询
    filer与must,must_not,should的不同: 是否满足都不会增加评分
GET http://127.0.0.1:9200/movies/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "title": "panda"
                    }
                }
            ],
            "filter": {
                "range": {
                  "release_date": {
                    "gte": 1212537500,
                    "lte": 1212537700
                  }
                }
            }
        }
    }
}
  • 复合查询:
    例如查询标题中包含panda并且发布时间在指定范围的文档
GET http://127.0.0.1:9200/movies/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "release_date": {
                            "gte": 1212537500,
                            "lte": 1212537700
                        }
                    }
                },
                {
                    "match": {
                        "title": "panda"
                    }
                }
            ],
            "boost": 1.0
        }
    }
}

bool复合查询可以理解为, 多个查询条件都要一起满足。
must,must_not,should

  • must: 必须满足
  • must_not : 必须不满足
  • should: 满不满足都可以,满足评分会更高

跟多用法参考官方文档

参考:
https://blog.csdn.net/Tc_lccc/article/details/118061349
https://blog.csdn.net/weixin_30650039/article/details/98046946

发表评论

邮箱地址不会被公开。 必填项已用*标注