In elasticsearch we can store closely related entities within a single document. For example, we can store a blog post and all of its comments together, by passing an array of comments.
{
"title": "Invest Money",
"body": "Please start investing money as soon...",
"tags": ["money", "invest"],
"published_on": "18 Oct 2017",
"comments": [
{
"name": "William",
"age": 34,
"rating": 8,
"comment": "Nice article..",
"commented_on": "30 Nov 2017"
},
{
"name": "John",
"age": 38,
"rating": 9,
"comment": "I started investing after reading this.",
"commented_on": "25 Nov 2017"
},
{
"name": "Smith",
"age": 33,
"rating": 7,
"comment": "Very good post",
"commented_on": "20 Nov 2017"
}
]
}
So we have an elasticsearch document describing a post and an inner object comments containing all the comments on a post. But inner objects in elasticsearch do not work as we expect. How ? We will see it soon.
PROBLEM
Now suppose we want to find all blog posts on which user {name: john, age: 34} has commented. So lets again look at our sample document above and find the users who had commented.
name
age
William
34
John
38
Smith
33
From the list we can clearly see that there is no user John of 34 years age. For simplicity consider we have only 1 document in elasticsearch index. Lets verify the same by querying the index:
Our sample document is returned in response. Surprised ?. Now that is why I said:
inner objects in elasticsearch do not work as expected
The problem here is that the library used by elasticsearch(lucene) has no concept of inner objects, so as a result inner objects are flattened into a simple list of field name and values. Our document is internally stored as:
{
"title": [ invest, money ],
"body": [ as, investing, money, please, soon, start ],
"tags": [ invest, money ],
"published_on": [ 18 Oct 2017 ]
"comments.name": [ smith, john, william ],
"comments.comment": [ after, article, good, i, investing, nice, post, reading, started, this, very ],
"comments.age": [ 33, 34, 38 ],
"comments.rating": [ 7, 8, 9 ],
"comments.commented_on": [ 20 Nov 2017, 25 Nov 2017, 30 Nov 2017 ]
}
As you can clearly see above that the relationship between comments.name and comments.age has been lost. So that is why our document matches a query for john and 34.
SOLUTION
To solve this problem we just need to make a small change in mapping of elasticsearch. If you have a look at the mapping of index you will find that the type of comments field is object. We need to update it to type nested.
We can simply update the mapping of our index by running the below query:
After changing the mapping to type nested, there is a slight change in the way we can query the index. We need to use nested query. Given below is the nested query example:
The above query will return no document in response as there is no match of user {name: john, age: 34}.
Surprised again ? Just a small change solved a problem in no time. It may be a smaller change from our side, but a lot has changed in the way elasticsearch stores our document. Internally, nested objects index each object in the array as a separate hidden document, meaning that each nested object can be queried independently of the others.
Given below is the internal representation of sample document after changing mapping:
{
{
"comments.name": [ john ],
"comments.comment": [ after i investing started reading this ],
"comments.age": [ 38 ],
"comments.rating": [ 9 ],
"comments.date": [ 25 Nov 2017 ]
},
{
"comments.name": [ william ],
"comments.comment": [ article, nice ],
"comments.age": [ 34 ],
"comments.rating": [ 8 ],
"comments.date": [ 30 Nov 2017 ]
},
{
"comments.name": [ smith ],
"comments.comment": [ good, post, very],
"comments.age": [ 33 ],
"comments.rating": [ 7 ],
"comments.date": [ 20 Nov 2017 ]
},
{
"title": [ invest, money ],
"body": [ as, investing, money, please, soon, start ],
"tags": [ invest, money ],
"published_on": [ 18 Oct 2017 ]
}
}
As you can see each inner object is stored as a separate hidden document internally. This maintains the relationship between their fields.
CONCLUSION:
So if you are using inner objects in index and querying them too, verify that the type of inner object is nested. Else the query may return invalid result documents.
Thanks for reading. Please like and share so that it can reach out to other valuable readers too.
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here:
Cookie Policy
Anuj Verma
Share post:
In elasticsearch we can store closely related entities within a single document. For example, we can store a blog post and all of its comments together, by passing an array of comments.
So we have an elasticsearch document describing a post and an inner object comments containing all the comments on a post. But inner objects in elasticsearch do not work as we expect. How ? We will see it soon.
PROBLEM
Now suppose we want to find all blog posts on which user {name: john, age: 34} has commented. So lets again look at our sample document above and find the users who had commented.
From the list we can clearly see that there is no user John of 34 years age. For simplicity consider we have only 1 document in elasticsearch index. Lets verify the same by querying the index:
Our sample document is returned in response. Surprised ?. Now that is why I said:
The problem here is that the library used by elasticsearch(lucene) has no concept of inner objects, so as a result inner objects are flattened into a simple list of field name and values. Our document is internally stored as:
As you can clearly see above that the relationship between comments.name and comments.age has been lost. So that is why our document matches a query for john and 34.
SOLUTION
To solve this problem we just need to make a small change in mapping of elasticsearch. If you have a look at the mapping of index you will find that the type of comments field is object. We need to update it to type nested.
We can simply update the mapping of our index by running the below query:
After changing the mapping to type nested, there is a slight change in the way we can query the index. We need to use nested query. Given below is the nested query example:
The above query will return no document in response as there is no match of user {name: john, age: 34}.
Surprised again ? Just a small change solved a problem in no time. It may be a smaller change from our side, but a lot has changed in the way elasticsearch stores our document. Internally, nested objects index each object in the array as a separate hidden document, meaning that each nested object can be queried independently of the others.
Given below is the internal representation of sample document after changing mapping:
As you can see each inner object is stored as a separate hidden document internally. This maintains the relationship between their fields.
CONCLUSION:
So if you are using inner objects in index and querying them too, verify that the type of inner object is nested. Else the query may return invalid result documents.
Thanks for reading. Please like and share so that it can reach out to other valuable readers too.
Share this:
Like this:
Elasticsearch : What, How and Why?
At its core, you can think of Elasticsearch as a server that can process JSON requests and give you back JSON data in near real-time.
Share this:
Like this:
Continue Reading