Automatically optimize your Amazon OpenSearch Service vector database

AWS recently announced the general availability of automatic optimization for the Amazon OpenSearch Service vector engine. This feature streamlines vector index optimization by automatically evaluating configuration tradeoffs between search quality, speed, and cost savings. You can then run the vector intake pipeline to create an optimized index for the desired collection or domain. Previously, optimizing index configurations—including algorithm, compression, and engine settings—required experts and weeks of testing. This process must be repeated because optimizations are unique to specific data characteristics and requirements. Now you can automatically optimize vector databases in less than an hour without having to manage the infrastructure and gain expertise in index tuning.

In this post, we discuss how the auto-optimization feature works, its benefits, and share examples of auto-optimized results.

Overview of vector search and vector indexes

Vector search is a technique that improves search quality and is a cornerstone of generative AI applications. It involves using a type of artificial intelligence model to convert content into numeric encodings (vectors), allowing content to be matched based on semantic similarity instead of just keywords. You create vector databases by feeding vectors into OpenSearch and creating indexes that allow searches of billions of vectors in milliseconds.

The benefits of vector index optimization and how it works

The OpenSearch vector engine provides a number of index configurations to help you achieve a favorable trade-off between search quality (calls), speed (latency), and cost (RAM requirements). There is no universally optimal configuration. Experts must evaluate combinations of index settings such as Hierarchal Navigable Small Worlds (HNSW) algorithm parameters (such as m gold ef_construction), quantization techniques (such as scalar, binary or product) and engine parameters (such as memory optimized, disk or cold storage). The difference between configurations can be a 10% or more difference in search quality, hundreds of milliseconds in search latency, or up to three times the cost savings. For large deployments, cost optimizations can make or break your budget.

The following figure is a conceptual illustration of the trade-offs between index configurations.

Optimizing vector indices is time-consuming. Experts must create an index; evaluate its speed, quality and price; and make the appropriate configuration adjustments before repeating this process. These experiments can take weeks to run at scale, as building and evaluating a large index requires significant computing power, resulting in hours to days of processing just one index. Optimizations are unique to specific business requirements and each data set and trade-off decisions are subjective. The best tradeoffs depend on the use case, such as searching an internal wiki or an e-commerce site. Therefore, it is necessary to repeat this process for each index. Finally, if your application’s data is constantly changing, the quality of vector searches can degrade, requiring periodic rebuilding and optimization of your vector indexes.

Solution overview

With automatic optimization, you can run jobs and create optimization recommendations that consist of reports that measure performance in detail and explain recommended configurations. You can configure auto-optimization tasks by simply providing acceptable search latency and quality requirements for your application. Expertise in k-NN algorithms, quantization techniques, and engine setup is not required. It avoids the one-size-fits-all limitations of solutions based on a few preconfigured deployment types and offers a customized fit for your workloads. It automates the manual work described above. You simply run serverless jobs with automatic optimization for a flat rate per job. These tasks do not consume resources of your collection or domain. OpenSearch manages a separate warm pool of multi-tenant servers and parallelizes index evaluation between secure single-tenant workers for fast results. Automatic optimization is also integrated with vector intake channels, so you can quickly create an optimized vector index on a collection or domain from an Amazon Simple Storage Service (Amazon S3) data source.

The following screenshot shows how to configure the auto-optimization task on the OpenSearch console.

After the job completes (typically within 30-60 minutes for over a million datasets), you can view the recommendations and reports as shown in the following screenshot.

The screenshot illustrates an example where you have to choose the best trade-offs. Did you choose the first option that provides the highest cost savings (through lower memory requirements)? Or do you choose the third option, which brings a 1.76% improvement in search quality, but at a higher cost? To understand the details of the configurations used to achieve these results, you can view the subtabs at Details panel such as Algorithm parameters tab shown in the previous screenshot.

After you decide, you can create your optimized index on your target domain or OpenSearch collection, as shown in the following screenshot. If you’re indexing a collection or domain running OpenSearch 3.1+, you can enable GPU acceleration to increase build speed up to 10x faster at a quarter of the indexing cost.

Automatically optimize results

The following table shows some examples of automatic optimization results. To quantify the value of running automatic optimization, we present the gains compared to the default settings. Estimated RAM requirements are based on standard domain size estimates:

RAM required = 1.1 x (bytes per dimension x dimensions + hnsw.parameters.mx 8) x number of vectors

We estimate the cost savings by comparing the minimal infrastructure (having just enough RAM) to host the index with the default setup versus the optimized setup.

Dataset	Automatically optimize job configurations	Recommended changes to default values	RAM Required) (% reduced)	Estimated cost savings (Required data nodes for default configuration vs. optimized)	Appeal (% profit)
msmarco-distillbert-base-tas-b: 10M 384D vectors generated from MSMARCO v1	Acceptable recall >= 0.95 Moderate latency (around 200-300ms)	More indexing and search memory support (ef_search=256, ef_constructon=128) Use Lucene EngineDisk optimized mode with 5X oversampling 4X compression (4-bit binary quantization)	5.6 GB (-69.4%)	Less than 75% (3 x r8g.mediumsearch vs. 3 x r8g.xlarge.search)	0.995(+2.6%)
all-mpnet-base-v2: 1M 768D vectors generated from MSMARCO v2.1	Acceptable recall >= 0.95 Moderate latency (around 200-300ms)	Denser HNSW graph (m=32) Greater support for indexing and search memory (ef_search=256, ef_constructon=128) Disk-optimized mode with 3X oversampling8X compression (4-bit binary quantization)	0.7GB (-80.9%)	Less 50.7% (t3.small.search vs. t3.medium.search)	0.999 (+0.9%)
Cohere Embed V3: 113M 1024D vectors generated from MSMARCO v2.1	Acceptable recall >= 0.95 Fast latency (approx <= 50ms)	Denser HNSW graph (m=32) Greater indexing and search memory support (ef_search=256, ef_constructon=128) Using Lucene engine4X compression (uint8-scalar quantization)	159 GB (-69.7%)	Less 50.7% (6 x r8g.4xlarge.search vs. 6 x r8g.8xlarge.search)	0.997 (+8.4%)

Conclusion

You can start building automatically optimized vector databases on Vector ingestion OpenSearch Service console page. Use this feature with GPU-accelerated vector indexes to create optimized billion-scale vector databases in hours.

Automatic optimization is available for OpenSearch vector collections and OpenSearch 2.17+ domains in the US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Mumbai, Singapore, Sydney, Tokyo), and Europe (Frankfurt, Ireland, Stockholm) AWS regions.