Hello World

ElasticSearch Tutorial for Beginners 본문

Java/Core

ElasticSearch Tutorial for Beginners

EnterKey 2016. 1. 12. 19:34
반응형

1. Introduction

ElasticSearch is a search engine that can store large volumes of data and run queries in an extremely faster manner. ElasticSearch is powered by Lucene which is an Apache project. It provides all the necessary infrastructural support around the bare Lucene to provide a highly scalable, highly available and user friendly search engine. ElasticSearch can store all kind of structured/unstructured data in the form of JSON documents; which will be indexed at the time of insertion. It is these indexes that make searches extremely faster.

ElasticSearch provides plain Java and Restful apis. It also nicely integrates with Logstash which is a data processing tool that can collect data from diverse sources, enrich/cleanse/transform the data and finally load it into diverse target systems. It is a pluggable framework with a wide range of input, output and filter plugins.

2. Example

In this example we will experiment with the Rest Apis provided by elastic search to get a feel of how data can be imported into it and later run some queries to see fetch the data. We will use just the soap ui and create a simple Rest based project.  Add the endpoint url http://localhost:9200 to your rest project. (This is the url at which your ElasticSearch instance is running). Next follow the steps given below:

2.1 Create mapping

ElasticSearch allows you to add data to it without having to specify the mapping beforehand. When data is imported this way, ElasticSearch tries to guess the type of the fields in the input document based on their values.   This in some cases can lead to problems such as parsing exceptions. Therefore it is better to have the mapping defined. There are different kinds of schema modelling that Elastic Search supports – Denormalised, Nested Objects and Parent-Child relationship. Each one of these have their own advantages and disadvantages. Depending on the use case we need to carefully choose one of these modelling schemas to define our mapping. In our example we will use Parent-Child relationship modelling and define the schema mapping accordingly. In our example we will try to map Transactions as parent document and the wordings in different language; associated with the transactions as child documents. So lets create a mapping as shown below.

  1. Add a new child resource to the Rest project in soap UI called ”CreateMapping”
  2. Set http verb as PUT
  3. Set the endpoint poiting to your elasticsearch home URL and  resource as /transactions
  4. Paste the content below in the editor – this is the schema mapping. And then click on run.
01{
02"mappings": {
03"main_type": {
04"properties": {
05"identifier": { "type""string" },
06"transactionAccountingDate": { "type""date" },
07"transactionPostingDate": { "type""date" },
08"transactionValueDate": { "type""date" },
09"transactionAmount": { "type""string" },
10"transactionAmountCurrency": { "type""string" }
11}
12},
13"content": {
14"_parent": {
15"type""main_type"
16},
17"properties":{
18"language":{
19"type":"string"
20},
21"description":{
22"type":"string"
23}
24}
25}
26}
27}

2.1.1 Document field value type validation

With the schema mapping as defined above where in we have declared the types of each one of the fields in the document, elasticsearch validates the values in the document at the time of insertion. If the values are not of the type specified in the schema mapping then it results in the parsing exception as shown below:

  1. Add a new child resource to the Rest project in soap UI called “Create Document”
  2. Set http verb as PUT
  3. Set the endpoint poiting to your elasticsearch home URL and  resource as /transactions/main_type/_create
  4. Paste the content below in the editor. And then click on run.
1{"identifier":"XY12363113597800","transactionAccountingDate":"2015-11-29", "transactionPostingDate":"2015-11-28","transactionValueDate":"SomeValue","transactionAmount":"0.18","transactionAmountCurrency":"EUR"}

The response returned is:

01{
02   "error":    {
03      "root_cause": [      {
04         "type""mapper_parsing_exception",
05         "reason""failed to parse [transactionValueDate]"
06      }],
07      "type""mapper_parsing_exception",
08      "reason""failed to parse [transactionValueDate]",
09      "caused_by":       {
10         "type""illegal_argument_exception",
11         "reason""Invalid format: \"SomeValue\""
12      }
13   },
14   "status": 400
15}

This way, any invalid data is rejected at the time of insertion itself, ensuring data quality.

2.1.2 Introducing new fields dynamically in the document

Again, with the schema mapping as defined above in 2.1, it is possible to add new fields to the document dynamically, elasticsearch nicely laps them up without any error. On the other hand, if we declare the mapping to be strict then any such additions will be rejected. Consider the scenario below where in we will first modify the schema mapping to enable strict mapping, and then try adding a new field to the document which will result in an error.

2.1.2.1 Update schema mapping

  1. Add a new child resource to the Rest project in soap UI called “Update Mapping”
  2. Set http verb as PUT
  3. Set the endpoint pointing to your elasticsearch home URL and  resource as /transactions/_mapping/main_type
  4. Paste the content below in the editor . And then click on run.
1{
2   "dynamic":"strict"
3}

2.1.2.2 Adding new field to the document

Add a new field to the document by adding a new method to the “Create Document” Resource created in section 2.1.1 above. Paste the contents below and click on run.

1{"identifier":"XY12363113597800","transactionAccountingDate":"2015-11-29", "transactionPostingDate":"2015-11-28","transactionValueDate":"2015-11-28","transactionAmount":"0.18","transactionAmountCurrency":"EUR","New_Field":"someValue"}

This results in the error as shown below:

01{
02   "error":    {
03      "root_cause": [      {
04         "type""strict_dynamic_mapping_exception",
05         "reason""mapping set to strict, dynamic introduction of [New_Field] within [main_type] is not allowed"
06      }],
07      "type""strict_dynamic_mapping_exception",
08      "reason""mapping set to strict, dynamic introduction of [New_Field] within [main_type] is not allowed"
09   },
10   "status": 400

2.2 Import/create data

Elastic search provides api to insert data into it one after the other and also to do a bulk import. Lets do a bulk import using the bulk api with the steps given below:

  1. Add a new child resource to the Rest project in soap UI called ”Bulk Insert”
  2. Set http verb as POST
  3. Set the endpoint pointing to your elasticsearch home URL and  resource as /transactions/_bulk
  4. Paste the content below in the editor . And then click on run.
01{"index":{"_type":"main_type","_id":1}}
02{"identifier":"XY12363113597800","transactionAccountingDate":"2015-11-29", "transactionPostingDate":"2015-11-28","transactionValueDate":"2015-11-28","transactionAmount":"0.18","transactionAmountCurrency":"EUR"}
03{"index":{"_type":"content","_id":1,"_parent":1}}
04{"language":"NL","description":"Molignestraat"}
05{"index":{"_type":"content","_id":2,"_parent":1}}
06{"language":"FR","description":"Rue de la Molignee"}
07{"index":{"_type":"main_type","_id":2}}
08{"identifier":"XY12363113597801","transactionAccountingDate":"2015-11-29", "transactionPostingDate":"2015-11-28","transactionValueDate":"2015-11-28","transactionAmount":"0.18","transactionAmountCurrency":"EUR"}
09{"index":{"_type":"content","_id":3,"_parent":2}}
10{"language":"NL","description":"Molignestraat"}
11{"index":{"_type":"content","_id":4,"_parent":2}}
12{"language":"EN","description":"Molignee Street"}

2.3 Update data

ElasticSearch provides update api using which we can update the previously imported or created data. It updates a document based on the script provided. In our example, since transaction related data is generally immutable, we will introduce/add a new field to the document that can be logically updated, called “tags”, in reality it is possible to tag a transaction based on the user preference. In this section we will see how to modify an existing mapping to add a new field to the document and then tag a transaction using update api.

2.3.1 Update mapping – add new field

To add a new field called “tags” to the document main_type, follow the steps given below:

  1. Add a new method to the resource “Update Mapping” created in section 2.1.2.1 called “Add new field”
  2. Set http verb as PUT
  3. Paste the content below in the editor . And then click on run.
1{
2 "properties" : {  
3 "tags" : {
4                "properties" : {
5                    "name" : {"type" "string"}
6                }
7            }
8 }          
9 }

2.3.2 Update transaction using update api

It is possible that the users tag their transactions based on their preferences, to help analyse their spending patterns. To achieve this we will make use of the update api as shown as below:

  1. Add a new child resource to the Rest project in soap UI called “Update Document”
  2. Set http verb as POST
  3. Set the endpoint pointing to your elasticsearch home URL and  resource as /transactions/main_type/1/_update
  4. Paste the content below in the editor . And then click on run.
01{
02    "script" : {
03           "inline""ctx._source.tags += tags",
04        "params" : {
05           "tags" :[
06           {"name" "xmas gift" }
07        ]
08        }
09    }
10}

It is possible that certain things like update, inline scripting are disabled, in which case you might get an error as shown below:

01{
02   "error":    {
03      "root_cause": [      {
04         "type""remote_transport_exception",
05         "reason""[Arkus][127.0.0.1:9300][indices:data/write/update[s]]"
06      }],
07      "type""illegal_argument_exception",
08      "reason""failed to execute script",
09      "caused_by":       {
10         "type""script_exception",
11         "reason""scripts of type [inline], operation [update] and lang  are disabled"
12      }
13   },
14   "status": 400
15}

These can be enabled by editing the elasticsearch.yml file located in the config directory of elasticsearch home. Modify this file by adding the contents given below. Save it and restart elastic search.

1script.engine.groovy.inline.update: on

Now the above update should work, which allows you to add as many tags as you want.

2.3.3 Versioning

Updates could result in concurrency issues in a multiuser environment. ElasticSearch addresses concurrency issues using inbuilt versioning mechanism. Everytime a document is updated, its version number is automatically incremented. In addition to this elastic search allows using version numbering from an external system, in which case the version number should be provided as a url parameter.

  1. Add a parameter called “version” to the child resource “Update Document”
  2. Set its value to previous version + 1
  3. Paste the content below in the editor . And then click on run.
01{
02    "script" : {
03           "inline""ctx._source.tags += tags",
04        "params" : {
05           "tags" :[
06           {"name" "drinks" }
07        ]
08        }
09    }
10}

Now, when you query for the transaction with id 1, you will see that it is tagged as both “xmas gift” and “drinks”, and the version being incremented to the value set for the query parameter “version”.

2.4 Run search queries

Elastic search provides Search API, queries can be run either by URI search using a simple Query string as a parameter or using a request body.

To fetch a particular transaction using Uri Search, type the following in the browser: http://localhost:9200/transactions/main_type/_search?q=identifier:XY12363113597800%27

This will return that transaction record in the json format.

Similarly, to fetch a particular child, type the following in the browser: http://localhost:9200/transactions/content/_search?q=language=NL

This will return all the content with Language equals to NL.

Now, to fetch the wording in a given language of a particular transaction, we will write a search query using query DSL as shown below:

  1. Add a new child resource to the Rest project in soap UI called ”Search Query”
  2. Set http verb as POST
  3. Set the endpoint pointing to your elasticsearch home URL and  resource as /transactions/main_type/_search
  4. Paste the content below in the editor . And then click on run. The query below will fetch the wording in NL along with the transaction with id XY12363113597800
01{
02"query": {
03"filtered": {
04"query": {
05 
06"bool": {
07"must": [
08{"match": { "identifier""XY12363113597800"}}
09],
10"filter": [
11"range": { "transactionValueDate": { "gte""2015-11-27" }}}
12]
13}
14},
15"filter":{
16"has_child": {
17"type""content",
18"query" : {
19"filtered": {
20"query": { "match_all": {}},
21"filter" : {
22"and": [
23{"match": {"language""NL"}}
24]
25}
26}
27},
28 
29"inner_hits" : {}
30}
31}
32}
33}
34 
35}

3. Download the soap ui project

In this example we learnt how to get started with ElasticSearch, taking a simple example scenario and seeing how the rest apis provided by elastic search can be used for creating schema mapping, updating the mapping, insertion(bulk api also) and querying using both URl query parameter and Query DSL.

Download
You can download the soap ui project for this example from this link:ElasticSearchBiginnersTutorials

출처: http://examples.javacodegeeks.com/elasticsearch/elasticsearch-tutorial-beginners/

반응형
Comments