Thursday, September 19, 2024
Home » Lowering Elasticsearch TCO with searchable snapshots

Lowering Elasticsearch TCO with searchable snapshots

Elasticsearch is a search engine that provides distributed full-text search with an HTTP web interface. Enterprises, including many Scality customers, use Elasticsearch to quickly find virtually any type of document or information in their organization.

With an ever-growing volume of data to index, search and retain, one of the potential challenges Elasticsearch customers face is the high cost of the SAN storage the company uses as its primary storage tier. The good news is that Elasticsearch includes searchable snapshots, providing the ability to perform snapshots of indexes or even entire clusters and store these snapshots on Amazon S3-compatible object storage such as Scality’s industry-leading RING and ARTESCA. Scality provides a more cost-effective and scalable storage solution than SAN systems. It also allows customers to continue to benefit from the power of Elasticsearch while increasing data durability and reducing total cost of ownership (TCO).

With Elasticsearch searchable snapshots, customers can store all of their inactive data on Scality. This frees up the more expensive SAN storage Elasticsearch uses. Providing petabyte-scale and up to eleven 9s of data durability, Scality not only protects your valuable Elasticsearch data, but also allows you to continue to search that data without needing to move it back to the SAN.

How to configure Elasticsearch and Scality RING

In this section, we will go through the simple steps needed to configure Scality as a repository for Disaster Recovery (DR) or searchable snapshots. We used the Elasticsearch Cloud console for illustration; however, these steps are identical if you are running your Elasticsearch on Amazon Web Services, Microsoft Azure, Google Cloud or on-premises.

Rather than storing all of our data in the public cloud, we chose to keep our snapshots and searchable snapshots in an on-premises installation of Scality RING. Note that the Scality ARTESCA configuration is identical.

Data ingestion

We added data using filebeat and simulated ingesting apache logs from a web server. Rather than generating actual web traffic on the server, we used gogen configured in a crontab job as such:

The logs are shipped to the Elasticsearch cluster using filebeat. Installation instructions to setup filebeat are detailed in the Elasticsearch interface:

Once the logs start loading into the Elasticsearch cluster, Elasticsearch will display, “Data successfully received from this module.”

Configure Scality as a snapshot repository

To set up searchable snapshots you must first create a bucket in Scality, which can be done using the S3 browser. Note that the optional bucket-encryption setting can be enabled if data at-rest encryption is required.

Once created, you can register the bucket on the Elasticsearch cluster by using the Kibana interface, under “Dev Tools,” and create the following PUT request:

The newly created “scality_s3_repository” is now available for storing cluster snapshots (for DR purposes) or searchable snapshots (in the cold or frozen tiers).

Create snapshots for DR purposes

DR snapshots are used to restore all the data from an Elasticsearch cluster in case of a major failure. Using an S3 target is an easy option to externalize this backup data on a durable repository and ensure it will be available when needed.

Creating a snapshot

In the “snapshot & restore” section you can simply create a new snapshot policy, making sure to select “scality_s3_repository”.

The snapshots will then be listed as they are created.

Restoring snapshots

The snapshots are self-describing, so at any point in time you may restore them on a vanilla Elasticsearch cluster in case your main Elasticsearch cluster is lost. To do this, you first need to register the S3 repository as explained above and then follow the steps in the “Snapshots and restore” section.

After the restore is complete, you can start searching your data.

Using snapshots for cold and frozen tiers

Once the snapshot repository is configured, searchable snapshots can easily be enabled in the cold or frozen tier by configuring an Index Lifecycle Management (ILM) policy which transitions data from one tier to another.

In this example, we use the ILM policy configuration panel to enable Scality as the frozen tier.

Logs stored on the frozen tier remain searchable and are accessed transparently from the Elasticsearch UI or APIs.

Conclusion

Like most big data applications, Elasticsearch has been integrated with object storage to enable customers to effectively manage the ever-growing amount of data that can no longer only be stored on expensive tier one storage. With Elasticsearch and Scality, users gain the benefits of:

  • Optimizing the Elasticsearch infrastructure by allowing it to only have to manage hot data which reduces costs and complexity
  • Storing less frequently accessed data on highly-scalable, highly-durable, and cost effective storage
  • Allowing users to quickly search cold or archived data regardless of how old it may be

About Us

Solved is a digital magazine exploring the latest innovations in Cloud Data Management and other topics related to Scality.

Editors' Picks

Newsletter

Challenges solved, insights delivered, straight to your inbox.

Receive hand-picked articles, case studies, and expert opinions. Keep up with industry innovations and get actionable insights to optimize your strategy.

All Right Reserved. Designed by Scality.com