Elasticsearch aggregation give us the ability to ask questions to our data. The ability to group and find out statistics (such as sum, average, min, max) on our data by using a simple search query. In this post, we will see some very simple examples to understand how powerful and easy it is to use Elasticsearch aggregation. I will also share a postman collection link at the bottom of this post in case you want to try out these queries on your own.
Let’s say we have a car store and to formulate some reports we are interested in the following:
What is the average price of sold cars having manufacturer Audi ?
Find all cars made by Ford and average price of ford cars sold in Jul 2020
What is the total price of all cars sold in Jul 2020 ?
Which are the most popular car manufacturers?
How much sales were made each month ?
So let’s get started right away and look at our sample data on which we will be performing aggregation.
The query is very simple, we are just asking elasticsearch to first filter the records which have manufacturer Audi and then on all the records in the result do an average aggregation on the field price. Now lets see the response from elasticsearch:
Wow! here we go, with such a simple query we are able to find the correct result. Let’s spice up things now and move to the next one.
Find all cars made by Ford and average price of ford cars sold in Jul 2020
Now this is an interesting one, we want to see all the cars which have manufacturer Ford, but we need average price of only those which are sold in month of Jul 2020.
All cars manufactured by Ford
S.No
Car
Price
Sold On
1
Ford Fiesta
580000
18 Jul 2020
2
Ford Linea
420000
26 May 2020
3
Ford Figo
480000
13 Jul 2020
So there are total 3 cars made by Ford and the average price of cars sold in Jul 2020 is (580000 + 480000) / 2 = 530000
Now lets see what elasticsearch query we can use to get this result:
If you compare this query with the first one, the only difference is that we have added one extra date filter inside the aggs block. This is how we can filter results before performing aggregation on them.
Also if you look carefully, there is one more difference i.e in the first query we have used a
"size": 0
parameter.
There are many occasions when aggregations are required but search hits are not. For these cases the hits can be ignored by setting size=0
You can verify this by looking at the response of the first query, in that only the aggregation result was returned and we do not see the actual documents which are used to evaluate that result. Now let’s see the result of our second query:
Cool, a simple one. We simply apply a query range to filter out all cars sold in month of Jul 2020 and then we do a sum aggregation on them to find out the result. Let’s check the response:
Again the query is very simple, we first filter out all the cars which were sold in last 3 months and then we simply group them by their manufacturer. Cool, lets see the response now:
Don’t worry if the query looks complex. We are just doing nested aggregation. First we group all the data in monthly buckets and then on each bucket we perform a metric aggregation to sum the price. Think of this as a pipeline of aggregations.
Wow! that was accurate and definitely not very difficult to achieve.
All right that was it for this post. Hope you had fun and experienced power of Elasticsearch aggregations. If you enjoyed the post, please like and share it so that it also reaches other valuable readers. If you have any doubts or feedbacks, please scroll to the bottom and leave a comment. Again, thanks for reading 🙂
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here:
Cookie Policy
Anuj Verma
Share post:
Elasticsearch aggregation give us the ability to ask questions to our data. The ability to group and find out statistics (such as sum, average, min, max) on our data by using a simple search query. In this post, we will see some very simple examples to understand how powerful and easy it is to use Elasticsearch aggregation. I will also share a postman collection link at the bottom of this post in case you want to try out these queries on your own.
Let’s say we have a car store and to formulate some reports we are interested in the following:
So let’s get started right away and look at our sample data on which we will be performing aggregation.
What is the average price of sold cars having manufacturer Audi ?
From our sample data let’s find this manually
Now in order to find it using aggregation, we have to use the following query:
The query is very simple, we are just asking elasticsearch to first filter the records which have manufacturer Audi and then on all the records in the result do an average aggregation on the field price. Now lets see the response from elasticsearch:
Wow! here we go, with such a simple query we are able to find the correct result. Let’s spice up things now and move to the next one.
Find all cars made by Ford and average price of ford cars sold in Jul 2020
Now this is an interesting one, we want to see all the cars which have manufacturer Ford, but we need average price of only those which are sold in month of Jul 2020.
All cars manufactured by Ford
So there are total 3 cars made by Ford and the average price of cars sold in Jul 2020 is (580000 + 480000) / 2 = 530000
Now lets see what elasticsearch query we can use to get this result:
If you compare this query with the first one, the only difference is that we have added one extra date filter inside the aggs block. This is how we can filter results before performing aggregation on them.
Also if you look carefully, there is one more difference i.e in the first query we have used a
parameter.
You can verify this by looking at the response of the first query, in that only the aggregation result was returned and we do not see the actual documents which are used to evaluate that result. Now let’s see the result of our second query:
Wohoo! The results are accurate again. Also all the cars made by Ford are returned in the response. Let’s move to the next one now.
What is the total price of all cars sold in Jul 2020 ?
Again, let’s solve this manually first:
Let’s see what query we can use to solve this:
Cool, a simple one. We simply apply a query range to filter out all cars sold in month of Jul 2020 and then we do a sum aggregation on them to find out the result. Let’s check the response:
Awesome, its correct.
Which are the most popular car manufacturers?
For the sake of this article, let’s say the manufacturers which have sold most cars in last 3 months are considered to be popular.
Now let’s see what query we can use to find out this result:
Again the query is very simple, we first filter out all the cars which were sold in last 3 months and then we simply group them by their manufacturer. Cool, lets see the response now:
Great, our query worked fine!
How much sales were made each month ?
This is a tricky one!
Using the below query:
Don’t worry if the query looks complex. We are just doing nested aggregation. First we group all the data in monthly buckets and then on each bucket we perform a metric aggregation to sum the price. Think of this as a pipeline of aggregations.
Wow! that was accurate and definitely not very difficult to achieve.
All right that was it for this post. Hope you had fun and experienced power of Elasticsearch aggregations. If you enjoyed the post, please like and share it so that it also reaches other valuable readers. If you have any doubts or feedbacks, please scroll to the bottom and leave a comment. Again, thanks for reading 🙂
Bonus:
Postman collection link: https://www.postman.com/collections/201d2f5fea372d02fc55
Share this:
Like this: