Implementation of Elasticsearch with Ruby on Rails for advanced search

Implementation of Elasticsearch with Ruby on Rails for advanced search

Elasticsearch – This is a search engine that allows you to work with huge amounts of data in real time. It is based on Lucene and offers not only full-text search but also complex data queries including aggregation.

Ruby on Rails is a framework that emphasizes speed and ease of development. Using the principles of convention over configuration and DRY, Rails allows you to focus on the unique logic of the program, minimizing the amount of boilerplate code.

In the article, we will consider the use of Elasticsearch together with Ruby on Rails to implement search within the application.

Installation and configuration

We download it from the official website of Elasticsearch and follow the installation instructions for a specific OS.

ps elasticsearch requires Java

To integrate Elasticsearch with Rails, the application needs to add to Gemfile lines:

gem 'elasticsearch-model'
gem 'elasticsearch-rails'

Then use the command bundle install and hemes are installed in the project.

After installing the gems, the connection to Elasticsearch is configured. This can be done by creating an initializer in config/initializers with a name elasticsearch.rb and adding the following code to it:

Elasticsearch::Model.client = Elasticsearch::Client.new(host: 'localhost:9200')

We check that Elasticsearch is running and available at the specified address in our case localhost:9200.

Modules are included to use Elasticsearch with Rails models Elasticsearch::Model and Elasticsearch::Model::Callbacks in the model to be indexed. Example:

class Article < ApplicationRecord
  include Elasticsearch::Model
  include Elasticsearch::Model::Callbacks
end

Here we automatically synchronize the model with the Elasticsearch index when creating, updating or deleting records.

Basic functions

To index a model in Elasticsearch, you need to include modules Elasticsearch::Model and, optionally, Elasticsearch::Model::Callbacks Rails model:

class Article < ApplicationRecord
  include Elasticsearch::Model
  include Elasticsearch::Model::Callbacks
end

For adequate work, it is necessary to configure mappings – this is a description of how data should be indexed and stored:

class Article
  include Elasticsearch::Model
  include Elasticsearch::Model::Callbacks

  settings index: { number_of_shards: 1 } do
    mappings dynamic: 'false' do
      indexes :title, type: 'text', analyzer: 'english'
      indexes :content, type: 'text', analyzer: 'english'
      indexes :published_at, type: 'date', format: 'strict_date_optional_time||epoch_millis'
    end
  end

  def as_indexed_json(options={})
    as_json(only: [:title, :content, :published_at])
  end
end

We configure an index with one shard, disable the dynamic creation of mappings and define mappings for fields title, content and published_at. We also define the method as_indexed_jsonwhich specifies which model attributes must be serialized for indexing.

After setting up the model, you can index existing data using a rake task or by writing a custom script:

Article.find_each do |article|
  article.__elasticsearch__.index_document
end

The code goes through each article in the database and indexes it in Elasticsearch.

After the data is indexed, you can use Elasticsearch’s search capabilities to search and analyze the data:

response = Article.search('котики')
response.records.each do |record|
  puts record.title
end

We do it here search by articlecontaining the phrase “cats”, and bring out the found cats.

When model data changes, Elasticsearch-Model automatically syncs these changes with the corresponding index in Elasticsearch. If you need to manually update or delete indexed data, you can use methods update_document and delete_document.

article = Article.find(1)
article.title = "Updated Title"
article.save # автоматически обновляет документ в Elasticsearch

article.delete # автоматически удаляет документ из Elasticsearch

Other search options

There are many other variations of the search implementation, for example, you can search on several fields with different importance:

response = Article.search(query: {
  multi_match: {
    query:    'cats',
    fields:   ['title^10', 'content^2', 'tags'],
    type:     'best_fields',
    tie_breaker: 0.3
  }
})

The query searches for the phrase “cats” in the fields title, content and tags models Articleand the field title is given the highest priority ^10the field content — lower priority ^2and tags is used without specific gravity. Parameter tie_breaker helps manage the relevance of results when multiple fields match.

When defining the index, you can specify the use of analyzers that pre-process the text before indexing it:

response = Article.search(size: 0, aggs: {
  popular_tags: {
    terms: {
      field: 'tags'
    }
  }
})

For fields title and content the analyzer is used my_custom_analyzerwhich converts text to lowercase and removes accents

You can use phase-search to correct errors:

response = Article.search(query: {
  fuzzy: {
    title: {
      value: 'cats',
      fuzziness: 2
    }
  }
})

The query looks for words similar to “cats” in the field titleallowing for two errors in the word.

Let’s say we have an online store and we want to allow users to filter products by price categories:

response = Product.search(size: 0, aggs: {
  price_ranges: {
    range: {
      field: 'price',
      ranges: [
        { to: 50 },
        { from: 50, to: 100 },
        { from: 100 }
      ]
    }
  }
})

The query creates facets for products in three price ranges: under 50, between 50 and 100, and over 100.

Elasticsearch also supports geospatial queries, allowing you to search for objects within a certain radius of a given point

For example, you can search for all stores within a radius of 10 kilometers from the user’s current location:

response = Store.search(query: {
  bool: {
    must: {
      match_all: {}
    },
    filter: {
      geo_distance: {
        distance: "10km",
        location: { 
          lat: 40.715,
          lon: -73.988
        }
      }
    }
  }
})

You can search for objects located within a certain geographic area defined by a polygon:

response = Property.search(query: {
  geo_polygon: {
    location: {
      points: [
        { lat: 40.73, lon: -74.1 },
        { lat: 40.73, lon: -73.99 },
        { lat: 40.74, lon: -74.1 },
        { lat: 40.74, lon: -73.99 }
      ]
    }
  }
})

Geographic hash is a way of encoding location information into a short string of characters, this can be useful for quickly finding objects within a certain area.

response = Event.search(query: {
  geo_bounding_box: {
    location: {
      top_left: {
        lat: 40.73,
        lon: -74.1
      },
      bottom_right: {
        lat: 40.01,
        lon: -71.12
      }
    }
  }
})

Geospatial aggregations allow you to analyze data based on their location, for example, counting the number of objects in different regions:

response = Visitor.search(size: 0, aggs: {
  regions: {
    geo_hash_grid: {
      field: "location",
      precision: 3
    },
    aggs: {
      top_hits: {
        top_hits: {
          _source: {
            includes: [ "name", "location" ]
          },
          size: 10
        }
      }
    }
  }
})

Elasticsearch in connection with Ruby on Rails allows you to satisfy complex user requests and process large volumes of information.

You can learn more practical tools in online courses from my friends at OTUS. A detailed catalog of courses can be found at the link.

If you look events in Mykolaiv – https://city-afisha.com/afisha/

Related posts