At Agira, Technology Simplified, Innovation Delivered, and Empowering Business is what we are passionate about. We always strive to build solutions that boost your productivity.

Elasticsearch Percolator Query Implementation in Ruby

  • By Bharanidharan Arumugam
  • March 13, 2017
  • 4522 Views

In this article, we will see when we need to use elasticsearch percolator query and how to implement it in Ruby. The elasticsearch percolator query is written based on Ubuntu, but it works in other Linux libraries too.

How does it work?

We believe most Elasticsearch developers think conventionally, and so, they design documents according to the structure of data and store them in an index. Then they define queries through the search API to retrieve these documents. The percolator works in the opposite (reverse) direction. Meaning, first, you store queries into an index and then through the Percolate API you define documents in order to retrieve these queries

 

  • All queries are loaded in memory
  • Each document is indexed in memory
  • All queries get executed against it
  • Execution time linear to # of queries
  • Memory index gets cleaned up

Elastic Search Image

When do we need to use percolator?

The usage of the Percolate API in Elasticsearch is quite common, and for the purpose of document monitoring and alerting.

 

For example, provision of a platform that stores users’ interests in order to send the right content (notification alert) to the right users every time new content comes in.

 

For instance, a user subscribes to a specific topic, and as soon as a new article for that topic comes in, a notification will be sent to the interested users.

 

How is this done?

By expressing the users’ interests as an elasticsearch query, using the query DSL, and you can register it in elasticsearch as though it was a document. Every time a new article is issued, without needing to index it, you can percolate it to know which users are interested in it.

 

At this point in time you know who needs to receive a notification containing the article link (sending the notification is not done by elasticsearch though). An additional step would also be to index the content itself but that is not required.

 

The uses of this concept are many, such as alerting weather forecast, price monitoring, news alerts, stocks alerts, logos monitoring and many more.

 

Pre-requisites & Setup:

Java:

Elastic search engine is developed in Java, so we need to make sure Java is installed with help of the below command:

java --version

 

Installing Elasticsearch:

Next, install Elasticsearch with the below command:

sudo apt-get install elasticsearch

 

In order to make sure that Elasticsearch is installed correctly, use the following command:

curl -XGET 'localhost:9200'

 

The result should be something like the following:

{
"name" : "lNOxiFt",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "r8yOSyCjRtmHFYmdbijjpg",
"version" : {
"number" : "5.1.2",
"build_hash" : "c8c4c16",
"build_date" : "2017-01-11T20:18:39.146Z",
"build_snapshot" : false,
"lucene_version" : "6.3.0"
},
"tagline" : "You Know, for Search"

 

Using Percolator:

The following steps explain how your queries get store into an index and how you define documents in order to retrieve these queries through the Percolate API.

 

  1. Requirement and service set up
  2. Making a connection
  3. Create a index
  4. Index a query
  5. Percolate a document

Requirement & Service setup

In order to implement elasticsearch percolator, we need elasticsearch gem.

gem 'elasticsearch'

 

I created one service object to index query.

index_service = Services::Percolation.new
index_service.re_index

 

Making a connection

In order to make a connection, we need elasticsearch-transport, which provides a low-level Ruby client for connecting to an Elasticsearch cluster.

def initialize(cfg)
@cfg = cfg
transport_configuration = lambda do |f|
f.response :logger
f.adapter  :typhoeus
end
transport = Elasticsearch::Transport::Transport::HTTP::Faraday.new hosts: [
{ host: @cfg['elastic']['url'], port: @cfg['elastic']['port'] } ], &transport_configuration
@server = Elasticsearch::Client.new log: true, transport: transport
end
def re_index
index_name = "percolator-index"
delete_index(index_name)
create_index(index_name)
ds = ['foo', 'bar']
ds.map do |i|
index(i, index_name)
end
end

 

Create an index

Create an index with two mappings:

def create_index(index_name)
@server.indices.create index: index_name, body: {
mappings: {
doctype: {
properties: {
message: {
type: "text"
}
}
},
queries: {
properties: {
query: {
type: "percolator"
}
}
}
}
}
end

 

The doctype mapping is the mapping use to pre-process the document define in the elasticsearch percolator query before it gets index into a temporary index.

 

The queries mapping is the mapping used for indexing the query documents. A json object is store in the query field, and this json object actually constitutes an Elasticsearch query.

 

Further, this query field is configured in such a way as to utilise the percolator field type. This particular field type (the percolator field type) is used since it is the one that can comprehend the query dsl.

 

This is also useful because of the manner in which it stores the query. The documents specified on the elasticsearch percolator query can be match at any point later, with the query.

 

Index a query

Register a query in the percolator:

def index(ds, index_name)
query = { query: { match: { message: "#{ds}" } } }
begin
r = @server.index index: index_name, type: 'queries', id: ds, body: query
puts 'Indexing result:'
puts r.inspect
rescue Faraday::Error::ResourceNotFound,
Faraday::Error::ClientError,
Faraday::Error::ConnectionFailed => e
puts "Connection failed: #{e}"
false
end
end

 

Percolate a document

Match a document to the registered percolator queries:

def list_document(index_name='percolator-index')
sleep 2
doc = { query: { percolate: { field: "query", document_type: "doctype",document: {message: 'message foo bar'} } } }
data = @server.search index: index_name, type: 'queries', body: doc
puts "final result"
puts data
end

 

The above request will yield the following output response:

{"took"=>8, "timed_out"=>false, "_shards"=>{"total"=>5, "successful"=>5, "failed"=>0}, "hits"=>{"total"=>2, "max_score"=>0.25316024, "hits"=>[{"_index"=>"percolator-index", "_type"=>"queries", "_id"=>"foo", "_score"=>0.25316024, "_source"=>{"query"=>{"match"=>{"message"=>"foo"}}}}, {"_index"=>"percolator-index", "_type"=>"queries", "_id"=>"bar", "_score"=>0.25316024, "_source"=>{"query"=>{"match"=>{"message"=>"bar"}}}}]}}

 

This can then be use in whichever manner to render the desire output.

 

This is a sample implementation of elasticsearch percolator query using Ruby, and as mentioned above, it has quite a lot of features. To learn more, check this ElasticSearch Percolator.

 

To checkout this particular example, please check agiratech github repo.

Bharanidharan Arumugam

Technical Architect | Tech Lead | Mentor | - An enthusiastic & vibrant Full stack developer has 7 plus years of experience in Web development arena. Owns legitimate knowledge in Ruby, Ruby On Rails, AngularJs, NodeJs, MEAN Stack, CMS Services. Apart all, A Modest human who strictly says "No To Harming Humans".