scheme of fields for faceted search, filter in the online store

Short description

This article discusses using Elasticsearch (ES) to create a faceted search for an online store based on product characteristics. The author explains how to index facet values and provides examples of aggregation and filtering queries using nested filters and nested aggregations. The article emphasizes the importance of storing and analyzing numerical characteristics separately from text values. Additionally, the author notes a problem with filtering multiple values of the same filter and suggests that a separate article is needed to describe the solution. The article is intended for beginners who are using ES to build a faceted search in an online store.

scheme of fields for faceted search, filter in the online store

In this article, I will omit such details of working with Elasticsearch (hereafter ES), such as:

  1. How to install

  2. How to connect

  3. To reveal the complete mapping scheme for the product of the online store

  4. A detailed description of the entire structure and all requests for obtaining a product page with search results and a filter.

And something else.

Here, as it is written in the title, I will only try to describe the scheme only for the fields of product characteristics and how to make aggregation and filtering queries for them.

Preface

I came to write the article after the unsuccessful experience of developing an online store based on the framework and MySQL with tens of thousands of products that had several dozen characteristics and many meanings for them. Due to a lot of requests to get the values ​​of the product filter and maybe completely wrong structuring of the tables or for any other reason, the site was terribly slow and took a long time to load. It got to the point that the Yandex webmaster received a similar error:

Screenshot from the Internet.

The site was not developed by me. There was neither knowledge nor desire to deal with him in the future. I decided that later I will develop an online store myself, but using a different and non-relational data store, instead of Mysql. The choice fell on ES, and it took a lot of time when learning to understand the structuring of product characteristics and obtaining values ​​for them, which could later be changed painlessly and without touching the code. Personally, I lacked absolutely simple examples on the Russian-language Internet, which are available, for example, for PHP+Mysql.

Everything described is only based on my personal experience and understanding of the scheme and structure of documents, oriented to use for building a faceted search in an online store, which I came to during study and development. That is, the article is intended more for beginners who have started to study ES.

As a matter of fact

Elasticsearch is a distributed search and analytics engine based on Apache Lucene. The full description can be found on the official website.

Faceted search (faceted navigation) – product search in a section, category or on the full-text search page by characteristics: color, material, price, manufacturer, etc. For the end user, a set of filters. Each filter is a characteristic. The value of this filter is all possible values ​​of the characteristic. For an online store, this is the main search function, and users expect it to work quite quickly.

In the example below, the user is in the “chandeliers” category and additionally filtered products in the price range from 1,394 to 42,207 rubles. and with the color black. 198 products were found, and the filter panel on the left lists the characteristics that are included in the search results, as well as the number of available values ​​that have this attribute (number of facets):

Here you can test the filter and repeat the steps described above (the site uses ES).

ES is quite a powerful aggregation tool for creating a faceted search. One of the nice things about aggregations is that they can be nested—that is, you can define top-level aggregations that create buckets of documents and other aggregations that run inside those buckets. For ease of understanding, this is similar to the SQL GROUP_BY command. On the basis of filters, documents are summarized and grouped according to some specific feature.

Indexing of facet values

Before creating aggregates, document attributes, which can be facets, must be indexed in ES. One way to index them is to list all the attributes and their values ​​in a single field, as in the following example:

"facets": {
  "color": "Черный",
  "style": "Лофт",
  "room": "Гостиная",
}

In this case, Mapping ES should look like this:

"facets": {
  "type": "nested",
  "properties": {
      "color": {
          "type": "keyword",    
      },
      "style": {
        "type": "keyword",
      }
      "room": {
        "type": "keyword",
      }
  }
}

This approach may be suitable, but for faceting in this case, you will have to explicitly list all the names of the fields for which we want to create an aggregation in the queries.

"aggs": {
  "facets": {
    "nested": {
      "path": "facets"
    },
    "aggs": {
      "color": {
        "terms": {
          "field": "facets.color"
        }
      },
      "style": {
        "terms": {
          "field": "facets.style"
        }
      },
      "room": {
        "terms": {
          "field": "facets.room"
          }
      },
    }
  }
}

It is obvious that this is not very practical and efficient with a large number of characteristics of goods that may subsequently change or be supplemented. And, for example, when deleting, changing or adding a new product characteristic, you will have to manually change the mapping, re-index and change the query by adding a new field name to it.

Instead, I came to the following

Split the facet names and values ​​sent to the elastic index like this:

"string_facets": {
  {
    "name": "color",
    "value": "Черный"
  },
  {
    "name": "color",
    "value": "Белый"
  },
  {
    "name": "style",
    "value": "Лофт"
  },
  {
    "name": "style",
    "value": "Техно"
  },
  {
    "name": "room",
    "value": "Гостиная"
  },
  {
    "name": "room",
    "value": "Спальня"
  }
}

Mapping:

"string_facets": {
  "type": "nested",
  "properties": {
    "name": {
      "type": "keyword",    
   },
    "value": {
      "type": "keyword",
    }
  }

Filtering and aggregating such a structure requires nested filters and nested aggregations in queries.

Aggregation:

"aggs": {
  "aggs_text_facets": {
    "nested": {
      "path": "string_facets"
    },
    "aggs": {
      "name": {
        "terms": {
          "field": "string_facets.name"
        },
        "aggs": {
          "value": {
            "terms": {
              "field": "string_facets.value"
            }
          }
        }
      }
    }
  }
}

Filtering:

"filter": {
  "nested": {
    "path": "string_facets",
    "filter": {
      "bool": {
        "must": {
          {
            "term": {
              "string_facets.name": "color"
            }
          },
          {
            "terms": {
              "string_facets.value": {
                "Черный"
              }
            }
          }
        }
      }
    }
  }
}

This applies to characteristics that have text values. Characteristics with numerical values ​​must be stored and analyzed separately. This is due to the fact that numerical characteristics (for example, dimensions: width, length) sometimes have many different values. And instead of enumerating all the possible values, it’s easy to get the minimum and maximum values ​​and display them as a range selector or slider. This is only possible if the values ​​are stored as numbers.

In mapping it will look like this:

"number_facets": {
  "type": "nested",
  "properties": {
    "name": {
      "type": "keyword",    
   },
    "value": {
      "type": "double",
    }
  }

Aggregation:

"aggs_number_facet": {
  "nested": {
    "path": "number_facets"
  },
  "aggs": {
    "name": {
      "terms": {
        "field": "number_facets.name"
      },
      "aggs": {
        "value": {
          "stats": {
            "field": "number_facets.value"
          }
        }
      }
    }
  }
}

This approach eliminates the need to know the list of available characteristics at the time of the query. Also, at any time, you can simply change the data in the index by removing or changing the desired characteristics, without touching the mappings and queries.

PS Having organized the document scheme in this way and written all the necessary requests, I faced another problem. When filtering, only products with the selected value in the product filter remained, accordingly, it was not possible to select several values ​​of the same filter, which in my case affected the convenience for users. A separate article is needed to describe the solution to the problem.

If you look events in Mykolaiv – https://city-afisha.com/afisha/

Related posts