OpenSearch Operator: Deployment, Scaling, and Optimization

Managing OpenSearch on Kubernetes isn’t just about getting it up and running—it’s about ensuring resilience, automation, and seamless scaling.

The OpenSearch Operator takes the guesswork out of these tasks, letting you focus on your data instead of cluster headaches. In this guide, we go beyond the basics, diving into advanced configurations, performance tuning, and troubleshooting complex issues.

What Is OpenSearch Operator and How Does It Work?

The OpenSearch Operator is a Kubernetes-native controller that manages OpenSearch clusters declaratively. It ensures cluster consistency, handles updates, and provides self-healing capabilities—all without manual intervention.

Key Benefits of Using OpenSearch Operator

Automated Cluster Management: Eliminates the need for manual OpenSearch setup and maintenance.
Self-Healing Capabilities: Detects and recovers from node failures.
Optimized for Kubernetes: Fully integrates with Kubernetes RBAC, StatefulSets, and CRDs.
Declarative Control: Define your cluster state via YAML, and the Operator enforces it automatically.
Zero-Downtime Upgrades: Handles rolling updates seamlessly without service interruption.

💡

If you're working with OpenSearch in Kubernetes, you might also find it useful to connect OpenSearch with Python for data querying and management. Check out this guide on how to do it.

Install OpenSearch Operator: Step-by-Step Guide

1. Prerequisites: What You Need Before Installation

Before installing OpenSearch Operator, ensure your environment meets these requirements:

Kubernetes cluster v1.19+ (Managed Kubernetes, EKS, AKS, or GKE recommended)
kubectl installed and configured
Helm 3+ (if using Helm-based installation)
Adequate CPU and memory resources for cluster scaling

2. Deploying OpenSearch Operator with Helm (Recommended Method)

helm repo add opensearch https://opensearch-project.github.io/helm-charts/
helm repo update
helm install opensearch-operator opensearch/opensearch-operator -n opensearch-operator --create-namespace

3. Deploying OpenSearch Operator Manually with YAML

git clone https://github.com/opensearch-project/opensearch-operator.git
cd opensearch-operator
kubectl apply -f config/crd/bases/
kubectl apply -f config/rbac/
kubectl apply -f config/manager/

4. Verifying OpenSearch Operator Deployment

Check if the Operator is running and healthy:

kubectl get pods -n opensearch-operator

💡

If you're using OpenSearch in Kubernetes, you might also want to understand how Amazon OpenSearch Service compares. Here's a guide that breaks it down.

How to Deploy and Configure OpenSearch Clusters with OpenSearch Operator

1. Creating a High-Availability OpenSearch Cluster Configuration

Define a robust cluster configuration in a YAML file (opensearch-cluster.yaml):

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: enterprise-opensearch-cluster
  namespace: default
spec:
  general:
    version: 2.11.0
    security: enabled
  nodePools:
    - name: master
      roles: [ "master" ]
      replicas: 3
      resources:
        requests:
          memory: "2Gi"
          cpu: "500m"
    - name: data
      roles: [ "data" ]
      replicas: 5
      storage:
        volumeClaimTemplate:
          spec:
            accessModes: [ "ReadWriteOnce" ]
            resources:
              requests:
                storage: 100Gi
    - name: client
      roles: [ "ingest" ]
      replicas: 2

The YAML file (opensearch-cluster.yaml) defines an OpenSearchCluster resource for Kubernetes.
The cluster is named enterprise-opensearch-cluster and is set up with OpenSearch version 2.11.0 with security enabled.
The cluster consists of three node pools:Master nodes: 3 replicas to handle cluster coordination.Data nodes: 5 replicas with 100Gi storage each to store and manage indexed data.Client nodes: 2 replicas responsible for handling ingest and query traffic.

2. Deploying Your OpenSearch Cluster

Apply the cluster configuration to Kubernetes:

kubectl apply -f opensearch-cluster.yaml

The command kubectl apply -f opensearch-cluster.yaml deploys the cluster configuration to Kubernetes.
Kubernetes schedules the necessary pods and provisions resources as defined in the YAML file.

3. Ensuring Cluster Health and Status

Monitor the cluster’s health and node distribution:

kubectl get pods -n default
kubectl describe pod <pod-name>

The command kubectl get pods -n default lists all OpenSearch-related pods in the default namespace to check if they are running.
kubectl describe pod <pod-name> provides more details, including events and logs, to diagnose any issues.

To check OpenSearch cluster status:

kubectl port-forward svc/enterprise-opensearch-cluster 9200:9200 &
curl -X GET "localhost:9200/_cluster/health?pretty"

The command kubectl port-forward svc/enterprise-opensearch-cluster 9200:9200 & forwards OpenSearch’s API port for local access.
Running curl -X GET "localhost:9200/_cluster/health?pretty" retrieves a formatted JSON response with details about cluster status, number of nodes, and health indicators.

Advanced Scaling, Performance Tuning, and Backup Strategies

Scaling OpenSearch Nodes Dynamically

As workloads grow, you may need to scale your OpenSearch cluster dynamically. You can adjust the number of nodes in your cluster by modifying the cluster configuration file and reapplying it.

For example, if you need to increase the number of data nodes from 5 to 7, update your opensearch-cluster.yaml:

- name: data
  roles: [ "data" ]
  replicas: 7  # Increased from 5

After making the changes, apply the updated configuration:

kubectl apply -f opensearch-cluster.yaml

Kubernetes will automatically adjust the number of data nodes in the cluster, ensuring that OpenSearch scales as needed. This method helps handle increased indexing and query loads without requiring a full redeployment.

💡

Thinking about running OpenSearch without managing infrastructure? This guide explores OpenSearch Serverless and what it means for your setup.

Optimizing OpenSearch Performance

To ensure your OpenSearch cluster runs efficiently, consider the following performance optimizations:

Sharding Strategy:
- Oversharding can degrade performance, leading to unnecessary resource consumption.
- Ideally, keep shard sizes between 10GB–50GB to balance query performance and cluster overhead.
- Use the _cat/shards API to monitor shard sizes:

curl -X GET "localhost:9200/_cat/shards?v"

Query Caching:

Enable request_cache to speed up repeated queries and reduce the load on data nodes.
Configure caching at the index level:

PUT /my_index/_settings
{
  "index.requests.cache.enable": true
}

Index Lifecycle Management (ILM):

Automate index rollovers and retention policies to prevent outdated indices from consuming unnecessary storage.
Define an ILM policy that automatically rolls over indices when they reach 50GB:

PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50GB"
          }
        }
      }
    }
  }
}

Heap Size Tuning:

OpenSearch performance heavily depends on proper heap size configuration.
Allocate 50% of available memory to the JVM heap but do not exceed 32GB (beyond which compressed object pointers lose efficiency).
Example heap size configuration for a system with 32GB RAM:

-XX:MaxHeapSize=16g -XX:InitialHeapSize=16g

Monitor heap usage with the following API request:

curl -X GET "localhost:9200/_nodes/stats/jvm?pretty"

Automating Backups with OpenSearch Snapshots

Regular backups are critical to preventing data loss. OpenSearch provides a snapshot and restore feature that allows you to create backups of indices, cluster metadata, and settings.

Configure a Snapshot Repository
- Before taking snapshots, define a repository where backups will be stored.
- Example configuration for a file system-based repository:

PUT _snapshot/my_backup_repo
{
  "type": "fs",
  "settings": {
    "location": "/mnt/backups"
  }
}

You can verify the repository configuration with:

GET _snapshot/my_backup_repo

Creating a Snapshot

Once the repository is set up, initiate a snapshot of your OpenSearch cluster:

PUT _snapshot/my_backup_repo/snapshot_1

Check the status of your snapshot with:

GET _snapshot/my_backup_repo/snapshot_1

Restoring from a Snapshot

If needed, restore a snapshot using:

POST _snapshot/my_backup_repo/snapshot_1/_restore

By default, the restore process will bring back all indices, but you can specify individual indices:

POST _snapshot/my_backup_repo/snapshot_1/_restore
{
  "indices": "logs-*"
}

Troubleshooting OpenSearch Operator Issues

When managing OpenSearch with the OpenSearch Operator, you may encounter cluster failures, resource constraints, or unexpected performance issues.

Below are some common problems and their corresponding troubleshooting steps to diagnose and resolve them effectively.

1. Cluster Instability

If your OpenSearch cluster is experiencing frequent restarts, data loss, or unresponsiveness, check the master node logs for any critical errors or leader election issues. Run the following command to fetch real-time logs:

kubectl logs -f <master-pod>

Look for messages related to node disconnections, shard failures, or out-of-memory errors. If the master node is frequently re-electing itself, consider adjusting the cluster quorum settings.

💡

Not sure how OpenSearch stacks up against Elasticsearch? This guide breaks down the key differences.

2. High CPU Usage

Excessive CPU usage in OpenSearch can lead to sluggish query responses or even node crashes. To identify resource-intensive queries, use the following API request:

GET _cat/tasks?v&detailed=true

This will provide details on running tasks, including queries consuming high CPU. If you notice frequent long-running queries:

Optimize index mappings and avoid wildcard searches.
Enable query caching.
Consider increasing CPU resource requests and limits in the OpenSearch cluster YAML.

3. Pods Stuck in Pending State

If OpenSearch pods remain in a Pending state and fail to start, inspect their resource constraints and scheduling conditions with:

kubectl describe pod <pod-name>

Common reasons for this issue include:

Insufficient CPU/Memory: Increase resource requests in the OpenSearch cluster spec.
Persistent Volume Issues: Ensure the requested storage class exists.
Node Affinity Conflicts: Check if pod scheduling constraints prevent allocation.

To resolve resource issues, you may need to scale worker nodes or modify OpenSearch pod resource allocations in the cluster YAML.

Final Thoughts

The OpenSearch Operator brings a whole new level of automation and efficiency to managing OpenSearch in Kubernetes environments. Running a small cluster or scaling up to handle massive amounts of data? Getting the right configurations, tuning strategies, and best practices in place will make all the difference for a smooth operation.

💡

Got a tricky setup or facing a particular challenge? our Discord community is always open. We’ve got a dedicated channel where you can connect with other developers and talk about your specific use case.

FAQs

1. How do I migrate an existing OpenSearch cluster to Kubernetes with OpenSearch Operator?

You can use OpenSearch snapshots to migrate an existing cluster. Create a snapshot on your current cluster and restore it in the Kubernetes-deployed OpenSearch cluster.

2. Does OpenSearch Operator support multi-cluster setups?

Yes, but you need to configure cross-cluster search (CCS) and snapshot-based data replication between clusters.

3. How do I ensure zero downtime when upgrading OpenSearch?

Use the rolling upgrade strategy by updating the cluster version in your YAML file and applying the changes. OpenSearch Operator handles node restarts sequentially to maintain availability.

4. Can I use OpenSearch Operator with managed Kubernetes services like EKS, GKE, or AKS?

Absolutely. OpenSearch Operator works well with cloud-managed Kubernetes services. Just ensure your cloud storage and networking configurations are optimized for OpenSearch workloads.