Managing OpenSearch on Kubernetes isn’t just about getting it up and running—it’s about ensuring resilience, automation, and seamless scaling.
The OpenSearch Operator takes the guesswork out of these tasks, letting you focus on your data instead of cluster headaches. In this guide, we go beyond the basics, diving into advanced configurations, performance tuning, and troubleshooting complex issues.
What Is OpenSearch Operator and How Does It Work?
The OpenSearch Operator is a Kubernetes-native controller that manages OpenSearch clusters declaratively. It ensures cluster consistency, handles updates, and provides self-healing capabilities—all without manual intervention.
Key Benefits of Using OpenSearch Operator
- Automated Cluster Management: Eliminates the need for manual OpenSearch setup and maintenance.
- Self-Healing Capabilities: Detects and recovers from node failures.
- Optimized for Kubernetes: Fully integrates with Kubernetes RBAC, StatefulSets, and CRDs.
- Declarative Control: Define your cluster state via YAML, and the Operator enforces it automatically.
- Zero-Downtime Upgrades: Handles rolling updates seamlessly without service interruption.
Install OpenSearch Operator: Step-by-Step Guide
1. Prerequisites: What You Need Before Installation
Before installing OpenSearch Operator, ensure your environment meets these requirements:
- Kubernetes cluster v1.19+ (Managed Kubernetes, EKS, AKS, or GKE recommended)
- kubectl installed and configured
- Helm 3+ (if using Helm-based installation)
- Adequate CPU and memory resources for cluster scaling
2. Deploying OpenSearch Operator with Helm (Recommended Method)
helm repo add opensearch https://opensearch-project.github.io/helm-charts/
helm repo update
helm install opensearch-operator opensearch/opensearch-operator -n opensearch-operator --create-namespace
3. Deploying OpenSearch Operator Manually with YAML
git clone https://github.com/opensearch-project/opensearch-operator.git
cd opensearch-operator
kubectl apply -f config/crd/bases/
kubectl apply -f config/rbac/
kubectl apply -f config/manager/
4. Verifying OpenSearch Operator Deployment
Check if the Operator is running and healthy:
kubectl get pods -n opensearch-operator
How to Deploy and Configure OpenSearch Clusters with OpenSearch Operator
1. Creating a High-Availability OpenSearch Cluster Configuration
Define a robust cluster configuration in a YAML file (opensearch-cluster.yaml
):
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
name: enterprise-opensearch-cluster
namespace: default
spec:
general:
version: 2.11.0
security: enabled
nodePools:
- name: master
roles: [ "master" ]
replicas: 3
resources:
requests:
memory: "2Gi"
cpu: "500m"
- name: data
roles: [ "data" ]
replicas: 5
storage:
volumeClaimTemplate:
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 100Gi
- name: client
roles: [ "ingest" ]
replicas: 2
- The YAML file (
opensearch-cluster.yaml
) defines an OpenSearchCluster resource for Kubernetes. - The cluster is named
enterprise-opensearch-cluster
and is set up with OpenSearch version 2.11.0 with security enabled. - The cluster consists of three node pools:Master nodes: 3 replicas to handle cluster coordination.Data nodes: 5 replicas with 100Gi storage each to store and manage indexed data.Client nodes: 2 replicas responsible for handling ingest and query traffic.
2. Deploying Your OpenSearch Cluster
Apply the cluster configuration to Kubernetes:
kubectl apply -f opensearch-cluster.yaml
- The command
kubectl apply -f opensearch-cluster.yaml
deploys the cluster configuration to Kubernetes. - Kubernetes schedules the necessary pods and provisions resources as defined in the YAML file.
3. Ensuring Cluster Health and Status
Monitor the cluster’s health and node distribution:
kubectl get pods -n default
kubectl describe pod <pod-name>
- The command
kubectl get pods -n default
lists all OpenSearch-related pods in the default namespace to check if they are running. kubectl describe pod <pod-name>
provides more details, including events and logs, to diagnose any issues.
To check OpenSearch cluster status:
kubectl port-forward svc/enterprise-opensearch-cluster 9200:9200 &
curl -X GET "localhost:9200/_cluster/health?pretty"
- The command
kubectl port-forward svc/enterprise-opensearch-cluster 9200:9200 &
forwards OpenSearch’s API port for local access. - Running
curl -X GET "localhost:9200/_cluster/health?pretty"
retrieves a formatted JSON response with details about cluster status, number of nodes, and health indicators.
Advanced Scaling, Performance Tuning, and Backup Strategies
Scaling OpenSearch Nodes Dynamically
As workloads grow, you may need to scale your OpenSearch cluster dynamically. You can adjust the number of nodes in your cluster by modifying the cluster configuration file and reapplying it.
For example, if you need to increase the number of data nodes from 5 to 7, update your opensearch-cluster.yaml
:
- name: data
roles: [ "data" ]
replicas: 7 # Increased from 5
After making the changes, apply the updated configuration:
kubectl apply -f opensearch-cluster.yaml
Kubernetes will automatically adjust the number of data nodes in the cluster, ensuring that OpenSearch scales as needed. This method helps handle increased indexing and query loads without requiring a full redeployment.
Optimizing OpenSearch Performance
To ensure your OpenSearch cluster runs efficiently, consider the following performance optimizations:
- Sharding Strategy:
- Oversharding can degrade performance, leading to unnecessary resource consumption.
- Ideally, keep shard sizes between 10GB–50GB to balance query performance and cluster overhead.
- Use the
_cat/shards
API to monitor shard sizes:
curl -X GET "localhost:9200/_cat/shards?v"
Query Caching:
- Enable
request_cache
to speed up repeated queries and reduce the load on data nodes. - Configure caching at the index level:
PUT /my_index/_settings
{
"index.requests.cache.enable": true
}
Index Lifecycle Management (ILM):
- Automate index rollovers and retention policies to prevent outdated indices from consuming unnecessary storage.
- Define an ILM policy that automatically rolls over indices when they reach 50GB:
PUT _ilm/policy/my_policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50GB"
}
}
}
}
}
}
Heap Size Tuning:
- OpenSearch performance heavily depends on proper heap size configuration.
- Allocate 50% of available memory to the JVM heap but do not exceed 32GB (beyond which compressed object pointers lose efficiency).
- Example heap size configuration for a system with 32GB RAM:
-XX:MaxHeapSize=16g -XX:InitialHeapSize=16g
- Monitor heap usage with the following API request:
curl -X GET "localhost:9200/_nodes/stats/jvm?pretty"
Automating Backups with OpenSearch Snapshots
Regular backups are critical to preventing data loss. OpenSearch provides a snapshot and restore feature that allows you to create backups of indices, cluster metadata, and settings.
- Configure a Snapshot Repository
- Before taking snapshots, define a repository where backups will be stored.
- Example configuration for a file system-based repository:
PUT _snapshot/my_backup_repo
{
"type": "fs",
"settings": {
"location": "/mnt/backups"
}
}
- You can verify the repository configuration with:
GET _snapshot/my_backup_repo
Creating a Snapshot
- Once the repository is set up, initiate a snapshot of your OpenSearch cluster:
PUT _snapshot/my_backup_repo/snapshot_1
Check the status of your snapshot with:
GET _snapshot/my_backup_repo/snapshot_1
Restoring from a Snapshot
- If needed, restore a snapshot using:
POST _snapshot/my_backup_repo/snapshot_1/_restore
- By default, the restore process will bring back all indices, but you can specify individual indices:
POST _snapshot/my_backup_repo/snapshot_1/_restore
{
"indices": "logs-*"
}
Troubleshooting OpenSearch Operator Issues
When managing OpenSearch with the OpenSearch Operator, you may encounter cluster failures, resource constraints, or unexpected performance issues.
Below are some common problems and their corresponding troubleshooting steps to diagnose and resolve them effectively.
1. Cluster Instability
If your OpenSearch cluster is experiencing frequent restarts, data loss, or unresponsiveness, check the master node logs for any critical errors or leader election issues. Run the following command to fetch real-time logs:
kubectl logs -f <master-pod>
Look for messages related to node disconnections, shard failures, or out-of-memory errors. If the master node is frequently re-electing itself, consider adjusting the cluster quorum settings.
2. High CPU Usage
Excessive CPU usage in OpenSearch can lead to sluggish query responses or even node crashes. To identify resource-intensive queries, use the following API request:
GET _cat/tasks?v&detailed=true
This will provide details on running tasks, including queries consuming high CPU. If you notice frequent long-running queries:
- Optimize index mappings and avoid wildcard searches.
- Enable query caching.
- Consider increasing CPU resource requests and limits in the OpenSearch cluster YAML.
3. Pods Stuck in Pending State
If OpenSearch pods remain in a Pending state and fail to start, inspect their resource constraints and scheduling conditions with:
kubectl describe pod <pod-name>
Common reasons for this issue include:
- Insufficient CPU/Memory: Increase resource requests in the OpenSearch cluster spec.
- Persistent Volume Issues: Ensure the requested storage class exists.
- Node Affinity Conflicts: Check if pod scheduling constraints prevent allocation.
To resolve resource issues, you may need to scale worker nodes or modify OpenSearch pod resource allocations in the cluster YAML.
Final Thoughts
The OpenSearch Operator brings a whole new level of automation and efficiency to managing OpenSearch in Kubernetes environments. Running a small cluster or scaling up to handle massive amounts of data? Getting the right configurations, tuning strategies, and best practices in place will make all the difference for a smooth operation.
FAQs
1. How do I migrate an existing OpenSearch cluster to Kubernetes with OpenSearch Operator?
You can use OpenSearch snapshots to migrate an existing cluster. Create a snapshot on your current cluster and restore it in the Kubernetes-deployed OpenSearch cluster.
2. Does OpenSearch Operator support multi-cluster setups?
Yes, but you need to configure cross-cluster search (CCS) and snapshot-based data replication between clusters.
3. How do I ensure zero downtime when upgrading OpenSearch?
Use the rolling upgrade strategy by updating the cluster version in your YAML file and applying the changes. OpenSearch Operator handles node restarts sequentially to maintain availability.
4. Can I use OpenSearch Operator with managed Kubernetes services like EKS, GKE, or AKS?
Absolutely. OpenSearch Operator works well with cloud-managed Kubernetes services. Just ensure your cloud storage and networking configurations are optimized for OpenSearch workloads.