Site searchers produce up to 14% of overall revenue, and site search optimization increases conversion rates by 43%. Having a site search is not enough though. There is a lot of work behind optimizing its performance to provide the fastest and most accurate match to search queries. There may be various challenges related to performance, such as increased traffic and workloads, search latency, downtime, irrelevant results, and more.
How to understand what component of your system may misbehave, and what would add an excess network hop? As an Elasticsearch consulting company, we want to share some of our insights. In this article, we've put together some of the best tips from Wise software engineers on how to tune Elasticsearch search performance and what metrics to keep an eye on to know how things are going.
How to improve Elasticsearch search performance
Configuration of internal site search is a part of custom E-commerce development. One of the most common tools that can be used to configure site search is Elasticsearch. It is an incredibly powerful analytics engine that is ideal for huge databases with a large catalog and hundreds of real-time customers. It supports multi-tenancy and has the ability to store, search, and analyze document files in diverse formats. There are lots of benefits to configuring site search with Elasticsearch, but the scaling of its performance is a far more challenging task.
Elasticsearch use cases include multiple situations when sites suffer from poor search performance, and often they remain overlooked. As for reasons why it might happen, there are several aspects to consider. Here are the tips that will help you to discover slow search faster and fix performance issues.
#1 Keep track of metrics
This piece of advice is related to Elasticsearch monitoring. To ensure efficient work of the site searcg, we need to understand the current performance level. The Elasticsearch system unites multiple components, and each of them affects its overall performance. Here are some of the most crucial indicators to keep an eye on.
- Cluster health. This metric provides a quick and simple overview of how well your configuration is done. Cluster health may be used to assess the overall health of a cluster or a specific index or shard. Basically, there are three health statuses: red, yellow, and green: a red state indicates that the specific shard is not allocated in the cluster, a yellow status shows that the primary shard is allocated but not the replicas; and a green status indicates that all shards are allocated.
- Indexing rate. This metric represents a measure of how many documents are being added to the index per unit of time. The greater the indexing rate, the greater the chance your infrastructure can keep up with all your demands.
- Query rate and latency. The query rate reveals how many requests per second your Elasticsearch installation can process. The query latency, in turn, reveals all the delays in all launched requests. The perfect scenario is when you have a high query rate and a low latency rate.
- Refresh time. Time to refresh describes how long it takes for a page to load after being refreshed. Having a shorter refresh time is preferable. A major cause for concern is rises and very long refresh times.
- CPU usage and disk space. Over time, more and more information is created, necessitating the allocation of more and more storage space. You'll also need a more powerful computer to process the increased volume of data and requests. Disk space and CPU usage should be tracked to ensure sufficient resources. Additional hardware resources may be required if one of these is insufficient.
The important place in search performance tuning is taken up by the choice of your hardware. Adding extra hardware isn't the only solution for scaling Elasticsearch. It goes beyond that. Even the most diligent optimization efforts are useless if you don't have the necessary hardware. When you are dealing with a massive quantity of data and want to provide your customers with speedy access to it, hardware is crucial.
Here are some suggestions for expanding your current Elasticsearch installation.
- Basic aspects. Cache, storage space, CPUs, and RAM are the primary areas to optimize for great use of your hardware and achieve maximum Elasticsearch performance.
- Scaling. Scaling Elasticsearch isn’t just a matter of throwing higher-specced hardware at a problem. You should resolve performance issues first to ensure that resources are not wasted. From the technological perspective, success depends on whether you can quickly innovate, scale up your computing powers, and leverage new content delivery channels.
- Performance testing. Constantly conduct performance testing to check all of your hardware requirements. Performance testing allows you to understand the way your app will react to increased traffic volumes or slow server speeds or network issues. You can thus make the required changes to the hardware required for your app. This saves you a lot of embarrassment at a later stage if the app does not function properly.
- Use AWS. AWS provides a secure, reliable, high-performance hosting infrastructure that can scale with your website – to massive proportions if necessary. AWS charges by the minute for its Elastic Compute Cloud (EC2) and Relational Database Service (RDS) instance, with higher fees for more powerful hardware. Spreading the load across multiple smaller instances can often be more cost-effective and result in better performance.
Make sure you can readily increase hardware resources, such as processor performance, memory capacity, and so on, to respond to unforeseen traffic spikes.
#3 Load balancing
Every time you start an Elasticsearch instance, you start a node. A cluster is a group of nodes that are all connected to each other. By default, HTTP and transport traffic can be handled by every node in the cluster. As a rule, there may be multiple nodes running in production with a large number of requests being processed, which might add an excess network hop.
Load balancing is a straightforward technique to distribute the load coming to an endpoint across multiple nodes. More specifically, it will distribute requests, gather results after processing, and then merge these results to construct and provide a final result. You may also specify the number of load balancers and configure them accordingly. As a result, load balancing decreases the strain on a specific node, enhancing performance.
How to enable load balancing in Elasticsearch? The Elasticsearch cluster includes load balancers by default. When creating an elasticsearch.url that refers to all/any coordinating nodes, we also recommend employing a load balancer or round-robin DNS server. A load balancer allows you to switch, add, or delete data nodes and coordinating nodes without having to edit the main configuration file or restart the service, reducing downtime for any changes.
Data in an Elasticsearch index can grow to massive proportions. To keep it manageable, Elasticsearch supports the concept of a cluster in which multiple nodes run on one or more host machines, which can be grouped into a cluster with a unique name. These clustered nodes hold up the entire data in the form of documents and provide the functionality of indexing and searching those documents.
Sharding is the process of splitting up the data in an index into numerous smaller pieces for easier querying. All queries in Elasticsearch are processed on a single thread for each shard. However, numerous shards can be run simultaneously. Therefore, many shards would result in several concurrent threads. Having the right number of shards is important for performance.
- Decrease the number of shards. _shrink and _split are APIs that allow you to modify the number of shards. This method may be used to maximize the number of fragments produced by a split, and it can be reversed to produce smaller output. Elasticsearch sets up each index with 5 primary shards and 1 copy of each shard by default. There will be 5 main shards for each index, with each shard replicated twice for a total of 10 shards.
- At the same time, you'll need to strike a balance between having too few or too many shards. There is no one right answer when deciding how many shards to utilize. The appropriate value for your needs lies within your purview of calculation. Most implementations begin with a single shard and add more as needed to get optimal performance.
- Replicas. Making one or more copies of the index’s shards called replica shards or simple replicas. Optimize necessary index settings that play a crucial role in Elasticsearch performance, like the number of shards and replicas. Having additional copies often improves search performance.
- Take care of the shard size. The appropriate number of shards should be determined by the quantity of data in an index. In general, an ideal shard should store 30-50GB of data. For example, if you plan to gather roughly 300GB of application logs every day, having around 10 shards in that index would be reasonable.
- Eliminate a multitude of small shards. Search performance may suffer if there are numerous little shards because of the high number of network requests and concurrent threads that may appear.
Elasticsearch organizes its data into indices. There is no hard and fast rule about the number of indices you need to save information, but here are some of our recommendations that may help you.
- Use several indices. All your data doesn't have to be contained in a single index. Indices can be configured to save information for a month, a day, or even an hour, depending on the requirements of the application.
- Your index buffer size should be large if your nodes are busy indexing a lot of data. The size of the index buffer is the amount of information that may be temporarily kept in a memory location before being permanently recorded on a disk. The default for this value is 10% of the total heap size which may need to be increased if your use case relies heavily on indexes.
- Take into account the fact that Elasticsearch data cannot be changed once it is stored. A document is then filed away in an index. However, if you need to make changes to the values in this document, you cannot just edit the values there. When this occurs, Elasticsearch generates a new document that includes the most up-to-date information.
- Elasticsearch also uses version numbers to maintain tabs on the most up-to-date content. Therefore, the most recent version is always the one with the largest number. However, the index now contains both old and new documents, which causes the index size to grow. You can fix this by reindexing the database. Reindexing your data ensures that you are using the most up-to-date information while also reducing your data storage needs.
#6 Interval for refreshing
Elasticsearch takes some time to index new data, so it can't be quickly accessed. In-memory storage is made available when indexing is complete. Whenever a refresh occurs, this information is saved to the segment. Reloading a website would achieve the same effect. The frequency of refreshes is controlled by the refresh interval. Results are automatically updated once per second with a preset refresh period of one second.
The default settings are to refresh the indexes every second for search queries that have been consecutive in the past 30 seconds. If your index is often searched, Elasticsearch will update the index every second. Increase the index if you can afford to increase the period between when a document is indexed and when it becomes visible. Setting the refresh interval to a higher number, such as 300 s (5 minutes), may assist in increasing the indexing speed.
The standard refresh rate should be sufficient in most situations. However, you should consider what is most suitable for you. There is a cost associated with maintaining and regularly updating indexes. A daily refresh is sufficient if you don't want data in near-real-time; for instance, if you only deal with data from the previous day.
#7 Enhanced queries
- Requesting in bulk is more efficient than making several separate requests. Typically, Elasticsearch is put to use for massive data queries. The idea that querying on particular indices may improve speed during request execution is not far-fetched. But that's not the case. If you know that the information you need is only in one index, then you should restrict your search to only that index. If, however, you wish to simultaneously search across all indexes for a certain piece of information, you should do it simultaneously.
- Use filters to narrow your search for a specific result. You need specific information from indices or documents. In other words, there is no purpose in retrieving all the data and then using only a subset of it. Instead, you get what you need from a database by doing a simple search. To retrieve the necessary information, you may use the _source keyword, or you can do a term aggregation to obtain one-of-a-kind data. Limit your search to only the information you need by specifying specifics like date range, matching criteria, and search phrases. By limiting your results, filters speed up your workflow.
#8 Well-formatted documents
Every index stores information in a set of documents. If documents are well-formatted, requests may be fulfilled more quickly. Queries are slowed down by parent-child and nested fields. If you want your queries to return results quickly, you should strive to make the documents as flat as possible. Here are several of our recommendations to do so.
- Remove documents. Having a significant number of removed documents in the Elasticsearch index creates search performance concerns. The Force merge API may be used to eliminate a large number of documents and optimize the shards.
- Instead of always having to calculate the amount, you can just retrieve the sum field. Raw data is often what makes it into Elasticsearch. Logs, whether they be system or application, are of particular interest. Data processing using ES queries is often unnecessary if data is preprocessed into the appropriate fields. You may also add more fields with commonly used values. If you find yourself regularly needing the sum of five integer values in a document, for instance, you may calculate the sum at document creation time and save it in a specific field.
Some third-party plugins generate a large amount of useless data in your database. If your database transactions become slow, look for large tables and what data is stored in them. Determine the source of the slowness and delete any data that you no longer require.
MySQL is capable of detecting and logging your slowest database requests. Enable this logging to discover the queries that are slowing down your site the most; you may then cache these searches in an in-memory store like Redis or Memcached, or delete the data (and plugin) completely if it is no longer required. If you have influence over the architecture of your database, make sure that columns that are often fetched during site searches are indexed.
#10 Caching strategy
Decide on a caching technique for your site early on. Caching pages and queries allows your site to fetch them faster, which improves speed. At the same time, it is recommended not to cache out-of-stock levels and other ephemeral data. Doing so would result in a confusing and irritating experience for your clients.
Keep track of the correct quantity of things in stock so that out-of-stock products aren't displayed and cached as available. If you sell through a lot of different channels, you might want to use a centralized stock management system to keep track of all the goods that come in and go out and to automatically clear your site's cache when this system changes.
One of the most important aspects of a successful E-commerce project is having a highly-performing search functionality. But, customer expectations nowadays are high. Even if you have an effective site search solution in place, you will always find the need to update and improve search performance to make it flexible enough to meet ever-changing customer requirements.
Here are key takeaways from our article that will help you to improve Elasticsearch search performance.
- Keep track of metrics such as cluster health, indexing rate, query rate, and more to understand the current state of performance.
- Use faster hardware that suits your requirements.
- Use load balancing to decrease the strain on a specific node and distribute all the load between different nodes.
- Take care of good sharding and indices.
- Use well-formatted documents.
- Decide on a caching technique for your site early on.
Elasticsearch search performance tuning is rather a complex task. So be cautious while adjusting any settings in the production environment. Book Elasticsearch consulting to figure out what’s best for your case or let us help you configure everything for you.
Perhaps you've just decided to start using Elasticsearch, or maybe you're currently an Elasticsearch user who envisions a lot of scope for your company, we are ready to help you. The Wise team can assemble a dedicated team to ensure end-to-end Elasticsearch integration and handle all the technical challenges of your case. Contact us to discuss details.