Home / 

Apache Solr is exceptionally reliable, scalable, and fault-tolerant. It provides everything from distributed indexing, replication, and load-balanced queries to centralized configuration, and automated failover and recovery. Solr powers the search and navigation features of many of the world's largest internet sites.

What many companies fail to recognize, however, is that an improperly configured Solr cloud can quickly cause costs to soar. By taking advantage of the best practices featured in this blog, you can achieve top-notch performance while saving costs.

Indexed Fields

You can save your company a considerable amount of money in storage and cloud usage fees by controlling the number of index fields you incorporate in your search implementation. Indexing represents a trade-off between quality, performance, and cost. It is important to note that creating too many indexes may end-up degrading performance while increasing costs. The best strategy is to choose minimum number of indexes to support your performance target.

The number of indexed fields increases the following:

  • Memory usage while indexing
  • Segment merge times
  • Optimization times
  • Index size

Do not mark fields as indexed=true if they are not used in a query.

Stored Fields

Retrieving stored fields of a query result can be a significant expense. This cost is affected by the number of bytes stored per document. The higher the byte count, the sparser the documents will be distributed on disk - which requires more I/O bandwidth to retrieve the desired fields.

These costs start to add up on cloud-based implementations. As with index fields, however, you are again confronted with a trade-off between quality, performance, and cost. As you define more stored fields, performance improves, but you incur higher costs.