Elasticsearch Reindex API: A Guide to Data Management

If you've been working with Elasticsearch for a while, you’ll eventually run into a situation where you need to reindex your data. Maybe you’re changing mappings, upgrading versions, or restructuring your documents. That’s where the Elasticsearch Reindex API comes in.

In this guide, we'll walk through everything you need to know about the Reindex API—what it is, how it works, common use cases, performance optimizations, and potential pitfalls. Let’s dive in.

Understanding the Elasticsearch Reindex API and How It Works

The Reindex API is an Elasticsearch tool that lets you copy documents from one index to another. Unlike a simple backup and restore, reindexing allows you to transform, filter, or modify documents during the process.

It works by reading documents from a source index and writing them into a target index. Since this is a heavy operation, Elasticsearch executes it asynchronously in the background unless explicitly requested otherwise.

Reindexing does not modify the source index. Instead, it creates a new copy of the data, allowing you to make adjustments before finalizing your migration or transformation.

💡

If you're evaluating log management solutions, check out our Elastic vs. Splunk comparison to see which platform best fits your needs.

4 Common Scenarios That Require Reindexing

Reindexing is necessary in several situations, including:

Modifying index mappings: If you need to update field types or analyzers, you often have to create a new index with the correct mappings and move the data over.
Elasticsearch version upgrades: Major version upgrades sometimes require reindexing due to breaking changes.
Transforming existing data: You might want to modify documents before storing them in the new index, such as renaming fields or changing data formats.
Splitting or merging indices: If you need to restructure your data, reindexing helps distribute documents properly across new indices.

How to Execute a Simple Reindex Operation

Here’s the most basic way to use the Reindex API:

POST _reindex { "source": { "index": "old_index" }, "dest": { "index": "new_index" } }

This command copies all documents from old_index to new_index without making any modifications.

Filtering Data During Reindexing with Queries

You can filter documents using a query inside the source block. For example, if you only want to copy documents where status is active, you can use the following command:

POST _reindex { "source": { "index": "old_index", "query": { "term": { "status": "active" } } }, "dest": { "index": "new_index" } }

This ensures that only documents meeting the specified criteria are moved to the new index.

Modifying Documents During Reindexing with Scripts

To modify documents while reindexing, use a script block. Here’s an example that adds a timestamp field to each document:

POST _reindex { "source": { "index": "old_index" }, "dest": { "index": "new_index" }, "script": { "source": "ctx._source.timestamp = params.time", "lang": "painless", "params": { "time": "2025-02-25T00:00:00Z" } } }

You can also rename fields, modify values, or remove fields entirely using scripting.

How to Optimize Performance When Reindexing Large Datasets

Reindexing a large dataset can be resource-intensive. Here are some best practices to improve performance:

Use slices for parallel execution: This speeds up the process by running multiple reindex operations simultaneously.POST _reindex { "source": { "index": "old_index" }, "dest": { "index": "new_index" }, "slice": { "id": 0, "max": 5 } }Repeat this with different id values (0 to 4) to run multiple slices concurrently.
Limit the batch size: Too many documents in one request can overload your cluster. Use size to limit each batch.POST _reindex { "source": { "index": "old_index", "size": 1000 }, "dest": { "index": "new_index" } }
Throttle requests to prevent overloading the cluster:POST _reindex?requests_per_second=500

💡

If you're considering alternatives to Elasticsearch, explore our OpenSearch vs. Elasticsearch comparison to understand key differences.

Monitoring the Progress of Reindex Operations

Reindexing is an expensive operation, so monitoring it is crucial. You can track its progress using:

GET _tasks?actions=*reindex

This returns active reindex tasks with their current status, allowing you to see if they are running smoothly or require intervention.

How to Enable Reindexing and Prepare an Index for Reindexing

Before executing a reindex operation, it’s important to ensure that your Elasticsearch indices are properly set up. This involves a few preparatory steps to prevent data inconsistencies and potential write conflicts during the process.

1. Enable Write Blocks on the Source Index

To prevent modifications to the source index while reindexing, it’s best to temporarily block write operations. This ensures data consistency and avoids missing updates.

PUT old_index/_settings { "index.blocks.write": true }

This makes the source index read-only for the duration of the reindexing process.

2. Create a Temporary Target Index with the Correct Mappings

Before reindexing, ensure that the target index exists with the correct mappings and settings. If needed, create the new index manually:

PUT new_index { "settings": { "number_of_shards": 1, "number_of_replicas": 1 }, "mappings": { "properties": { "field1": { "type": "text" }, "field2": { "type": "keyword" } } } }

This step is crucial when modifying mappings, as Elasticsearch does not allow dynamic type changes in existing fields.

3. Verify Available Resources Before Running the Reindex Operation

Reindexing is resource-intensive. Check cluster health and available disk space to ensure the process won’t overwhelm the system:

GET _cluster/health

Monitor disk space with:

GET _cat/allocation?v

Make sure there is sufficient free disk space to accommodate the additional index.

💡

If you're dealing with performance bottlenecks, check out DynamoDB throttling and how to fix it.

4. Execute the Reindex Operation

Once the preparation steps are completed, you can safely proceed with reindexing:

POST _reindex { "source": { "index": "old_index" }, "dest": { "index": "new_index" } }

After the reindexing process is complete, remember to remove the write block from the source index if necessary:

PUT old_index/_settings { "index.blocks.write": false }

By following these steps, you can avoid common issues such as inconsistent data, mapping conflicts, and system overloads.

How to Handle Mapping Conflicts During Reindexing

Mapping conflicts can arise when reindexing if the source and destination indices have incompatible field types. Elasticsearch enforces strict typing rules, so any mismatched field types can cause failures. Here’s how to resolve these conflicts.

1. Identify Mapping Differences

Before reindexing, compare the mappings of the source and target indices to detect conflicts.

GET old_index/_mapping

GET new_index/_mapping

Look for differences in field types, such as text vs. keyword or integer vs. long.

2. Create the Target Index with Correct Mappings

If mapping conflicts exist, define the correct mappings in the new index before reindexing. If a field’s type needs to change, update the target index accordingly:

PUT new_index { "mappings": { "properties": { "field1": { "type": "keyword" }, "field2": { "type": "text" } } } }

3. Use a Script to Transform Conflicting Fields

If you need to modify field values or types during reindexing, use a script to transform data on the fly. For example, if a field type is text in the source but should be keyword in the target, you can convert it like this:

POST _reindex { "source": { "index": "old_index" }, "dest": { "index": "new_index" }, "script": { "source": "ctx._source.field1 = ctx._source.field1.toString()" } }

This ensures compatibility by converting data before it is indexed.

4. Remove Conflicting Fields if Necessary

If some fields are no longer needed or cannot be converted, exclude them during reindexing:

POST _reindex { "source": { "index": "old_index" }, "dest": { "index": "new_index" }, "script": { "source": "ctx._source.remove('obsolete_field')" } }

5. Validate Reindexed Data

Once the process is complete, verify that documents were correctly indexed:

GET new_index/_search?size=5

This allows you to check if the changes were applied correctly before switching to the new index.

💡

If managing logs efficiently is a priority, explore log retention best practices to optimize storage and compliance.

5 Common Reindexing Issues and How to Fix Them

Reindexing can sometimes run into issues, such as timeouts, missing documents, or performance bottlenecks. Below are some common problems and their solutions.

1. Reindexing Operation Times Out

If your reindexing request times out, Elasticsearch may still be processing it in the background. Check the task status using:

GET _tasks?actions=*reindex

Solution:

Increase the timeout in the request:POST _reindex?timeout=10m
Use slicing to run multiple parallel reindexing operations:POST _reindex { "source": { "index": "old_index" }, "dest": { "index": "new_index" }, "slice": { "id": 0, "max": 5 } }Repeat for id values from 0 to 4.
Reduce batch size:POST _reindex { "source": { "index": "old_index", "size": 500 }, "dest": { "index": "new_index" } }

2. Mapping Conflicts Prevent Reindexing

If field types in the destination index differ from the source index, reindexing will fail.

Solution:

Ensure the target index has compatible mappings before reindexing.
Use scripts to modify field values or types as needed.
Exclude problematic fields from reindexing using:"script": { "source": "ctx._source.remove('conflicting_field')" }

3. Missing Documents in the Destination Index

If not all documents appear in the target index, check:

The query filter in the reindex request (ensure it's not excluding documents unintentionally).
Elasticsearch logs for dropped documents.

Solution:

Remove any unintentional filters:"query": { "match_all": {} }
Check for failed bulk requests in the response.
Increase the refresh interval to improve indexing speed.

4. Insufficient Disk Space

Reindexing creates a full copy of the data, which can fill up storage quickly.

Solution:

Check available disk space before reindexing:GET _cat/allocation?v
Enable index compression by setting "index.codec": "best_compression".
Delete old or unnecessary indices before reindexing.

5. Cluster Performance Issues During Reindexing

Reindexing is resource-intensive and can slow down your cluster.

Solution:

Throttle reindexing requests:POST _reindex?requests_per_second=100
Run reindexing during off-peak hours.
Increase cluster resources if reindexing is a frequent operation.

Wrapping Up

If you're working with Elasticsearch in production, always test reindexing on a staging environment before applying changes to live data. Implement best practices to ensure efficiency, and monitor your cluster's health throughout the process.

Happy indexing!

💡

And if you have more questions, join our Discord community, where a dedicated channel lets you discuss your use case with fellow developers.

Elasticsearch Reindex API: A Guide to Data Management

Contents

Understanding the Elasticsearch Reindex API and How It Works

4 Common Scenarios That Require Reindexing

How to Execute a Simple Reindex Operation

Filtering Data During Reindexing with Queries

Modifying Documents During Reindexing with Scripts

How to Optimize Performance When Reindexing Large Datasets

Monitoring the Progress of Reindex Operations

How to Enable Reindexing and Prepare an Index for Reindexing

1. Enable Write Blocks on the Source Index

2. Create a Temporary Target Index with the Correct Mappings

3. Verify Available Resources Before Running the Reindex Operation

4. Execute the Reindex Operation

How to Handle Mapping Conflicts During Reindexing

1. Identify Mapping Differences

2. Create the Target Index with Correct Mappings

3. Use a Script to Transform Conflicting Fields

4. Remove Conflicting Fields if Necessary

5. Validate Reindexed Data

5 Common Reindexing Issues and How to Fix Them

1. Reindexing Operation Times Out

2. Mapping Conflicts Prevent Reindexing

3. Missing Documents in the Destination Index

4. Insufficient Disk Space

5. Cluster Performance Issues During Reindexing

Wrapping Up

Contents

Do More with Less

Handcrafted Related Posts

How to Configure Docker’s Shared Memory Size (/dev/shm)

A Complete Guide to Linux Log File Locations and Their Usage

How Auditd Logs Help Secure Linux Environments