Elasticsearch Backup and Restore.

In this guide, we’ll walk through the steps required to back up and restore data in Elasticsearch. We’ll cover setting up a snapshot repository, taking snapshots (backups), and restoring data from those snapshots. This is a technical write-up intended for users who are already familiar with Elasticsearch operations.

Prerequisites

Elasticsearch installed and running (version 7.x or 8.x)
Appropriate permissions to access and modify Elasticsearch configurations
Access to the command line or Kibana Dev Tools for executing API calls

Setting Up a Snapshot Repository

Elasticsearch uses the concept of snapshot repositories to store backups. Before taking any snapshots, you need to register a repository where Elasticsearch can store them.

1. Choose a Storage Type

Elasticsearch supports various repository types:

Shared File System: For local backups
AWS S3
Azure Blob Storage
Google Cloud Storage
HDFS

For this guide, we’ll use a shared file system repository. Ensure that the directory is accessible by all Elasticsearch nodes and has the correct permissions.

2. Create the Repository Directory

On each Elasticsearch node, create a directory for storing snapshots:

sudo mkdir -p /mnt/es_backup 
sudo chown -R elasticsearch:elasticsearch /mnt/es_backup

3. Register the Repository

Use the _snapshot endpoint to register the repository:

PUT _snapshot/my_backup 
{ "type": "fs", 
  "settings": { 
      "location": "/mnt/es_backup", 
      "compress": true 
   }
}

my_backup: The name of your snapshot repository.
location: The path to the backup directory.
compress: Enables compression for the snapshots.

Note: If you receive a 403 Forbidden error, you may need to adjust the path.repo setting in your elasticsearch.yml configuration file:

path.repo: ["/mnt/es_backup"]

After updating, restart Elasticsearch for the changes to take effect.

Taking a Snapshot (Backup)

Once the repository is registered, you can take snapshots of your indices.

1. Snapshot All Indices

To snapshot all indices:

PUT _snapshot/my_backup/snapshot_1?wait_for_completion=true

snapshot_1: The name of the snapshot.
wait_for_completion: Waits for the operation to complete before returning a response.

2. Snapshot Specific Indices

To snapshot specific indices:

PUT _snapshot/my_backup/snapshot_2?wait_for_completion=true 
{ 
   "indices": "index_1,index_2", 
   "ignore_unavailable": true, 
   "include_global_state": false 
}

indices: A comma-separated list of indices to include.
ignore_unavailable: Ignores missing or closed indices.
include_global_state: Excludes cluster state metadata from the snapshot.

3. Verify the Snapshot

To list all snapshots in the repository:

GET _snapshot/my_backup/_all

Restoring from a Snapshot

Restoring data from a snapshot involves selecting the snapshot and specifying the indices to restore.

1. List Available Snapshots

First, list the snapshots to identify which one you want to restore:

GET _snapshot/my_backup/_all

2. Close Indices (If Necessary)

If you’re restoring indices that already exist, you need to close them first:

POST index_1/_close

3. Restore the Snapshot

Restore all indices from a snapshot:

POST _snapshot/my_backup/snapshot_1/_restore 
{ 
  "indices": "index_1,index_2", 
  "ignore_unavailable": true, 
  "include_global_state": false, 
  "rename_pattern": "index_(.+)", 
  "rename_replacement": "restored_index_$1" 
}

rename_pattern and rename_replacement: Rename indices during restore to avoid conflicts.

4. Monitor the Restore Process

You can monitor the progress of the restore operation:

GET _snapshot/my_backup/snapshot_1/_status

Additional Considerations

1. Automated Snapshots

Consider setting up automated snapshots using Elasticsearch’s Snapshot Lifecycle Management (SLM) feature.

2. Security Permissions

Ensure that the Elasticsearch process has read/write permissions to the snapshot directory. If you’re using a cloud storage repository, configure the necessary credentials.

3. Cluster State

Including the global cluster state in snapshots allows you to restore cluster-level settings and templates. Be cautious when restoring to a different cluster to avoid overwriting existing configurations.

Conclusion

Backing up and restoring data in Elasticsearch is a straightforward process once the snapshot repository is configured. Regular snapshots are crucial for data recovery and should be integrated into your maintenance routine. Always test your backup and restore procedures to ensure data integrity.

References:

Ashish Srivastava