If you're working with search and analytics, you’ve probably heard about OpenSearch—the open-source alternative to Elasticsearch. OpenSearch is a powerful tool, whether you're building a search engine, running log analytics, or implementing full-text search in your applications. And the best part? You can integrate it easily with Python.
This guide will walk you through everything you need to know to get started with OpenSearch using Python, from installation to advanced querying and performance tuning.
What is OpenSearch?
OpenSearch is an open-source search and analytics suite, originally forked from Elasticsearch 7.10. It provides a scalable, distributed search engine with built-in security, observability, and machine-learning features. OpenSearch is widely used for log analytics, full-text search, and business intelligence applications.
Key Features of OpenSearch:
- Full-Text Search: Powerful search capabilities with tokenization, stemming, and ranking algorithms.
- Scalability: Distributed architecture that allows the handling of large-scale data.
- Observability: Integrated tools for monitoring and analyzing logs and metrics.
- Security: Authentication, access controls, and encryption.
- Machine Learning: Supports anomaly detection and predictive analytics.
Why Use OpenSearch with Python?
Python has become the go-to language for developers working with search technologies, thanks to its rich ecosystem of libraries and ease of use.
Here’s why you might want to integrate OpenSearch with Python:
- Simple API access – The OpenSearch Python client makes it easy to interact with the search engine.
- Data analysis capabilities – Python’s data processing libraries (like Pandas and NumPy) complement OpenSearch’s querying power.
- Automation – Automate indexing, searching, and monitoring tasks using Python scripts.
- Integration with Machine Learning – Use OpenSearch with machine learning libraries such as TensorFlow and Scikit-learn.
Step-by-Step Process to SetUp OpenSearch with Python
1. Install and Run OpenSearch Using Docker
Before connecting OpenSearch to Python, you need to have OpenSearch running. You can set it up using Docker:
docker pull opensearchproject/opensearch:latest
docker run -d -p 9200:9200 -e "discovery.type=single-node" opensearchproject/opensearch:latest
This starts OpenSearch in single-node mode, making it easy to test and develop locally.
2. Install the OpenSearch Python Client for Hassle-free Interaction
To interact with OpenSearch from Python, install the OpenSearch client:
pip install opensearch-py
3. Establish a Connection to OpenSearch from Python
Now, let’s set up a connection to OpenSearch:
from opensearchpy import OpenSearch
client = OpenSearch(
hosts=[{'host': 'localhost', 'port': 9200}],
http_auth=('admin', 'admin')
)
print(client.info())
If everything is set up correctly, this will return basic cluster information.
How to Index and Search Data in OpenSearch
Before you can search, you need to index some data.
1. Setting Up an Index with Mappings and Settings
index_name = "products"
index_body = {
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"name": {"type": "text"},
"price": {"type": "float"},
"in_stock": {"type": "boolean"}
}
}
}
client.indices.create(index=index_name, body=index_body)
2. Adding Documents to the OpenSearch Index
document = {
"name": "Wireless Keyboard",
"price": 39.99,
"in_stock": True
}
client.index(index=index_name, body=document)
Searching Data in OpenSearch
Once data is indexed, you can run queries to retrieve it.
1. Running a Basic Search Query
query = {
"query": {
"match": {"name": "keyboard"}
}
}
response = client.search(index=index_name, body=query)
print(response)
2. Applying Filters to Refine Search Results
query = {
"query": {
"bool": {
"must": [
{"match": {"name": "keyboard"}}
],
"filter": [
{"range": {"price": {"gte": 30, "lte": 50}}}
]
}
}
}
response = client.search(index=index_name, body=query)
print(response)
Performance Optimization Tips
If you're working with large-scale data, performance optimization is key. Here are some best practices:
- Use bulk indexing – Instead of indexing documents one by one, use the
bulk
API to send batches of documents. - Optimize queries – Avoid wildcard queries and excessive aggregations.
- Shard wisely – Too many or too few shards can impact performance. Monitor and adjust based on your workload.
- Cache results – Use OpenSearch’s built-in caching mechanisms for frequently queried data.
How to Handle Security and Authentication
OpenSearch includes built-in security features such as TLS encryption and authentication.
1. Enabling Secure Connections with TLS and Authentication
OpenSearch includes built-in security features such as TLS encryption and authentication. To ensure secure communication between your Python client and OpenSearch, you should use HTTPS and enable certificate verification:
from opensearchpy import OpenSearch
client = OpenSearch(
hosts=[{'host': 'localhost', 'port': 9200}],
http_auth=('admin', 'admin'), # Replace with secure credentials
use_ssl=True,
verify_certs=True
)
use_ssl=True
ensures that SSL/TLS is enabled.verify_certs=True
enforces certificate validation to prevent man-in-the-middle attacks.
For production deployments, consider using proper SSL certificates instead of self-signed ones and restrict access using firewall rules.
2. Using API Keys or IAM for Secure Authentication on AWS
If you're running OpenSearch on AWS, you should use IAM-based authentication or API keys instead of basic authentication for improved security.
Using IAM Authentication (AWS SigV4 Signing)
You can authenticate your Python client using AWS IAM roles with the requests-aws4auth library:
from opensearchpy import OpenSearch
from requests_aws4auth import AWS4Auth
import boto3
region = 'us-east-1' # Change to your OpenSearch region
service = 'es'
credentials = boto3.Session().get_credentials()
aws_auth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
client = OpenSearch(
hosts=[{'host': 'your-opensearch-domain.us-east-1.es.amazonaws.com', 'port': 443}],
http_auth=aws_auth,
use_ssl=True,
verify_certs=True
)
boto3.Session().get_credentials()
retrieves temporary credentials for IAM authentication.- AWS SigV4 signing ensures authenticated and authorized access to OpenSearch on AWS.
Using API Key Authentication
If API key authentication is enabled, you can use it instead of IAM or basic authentication:
client = OpenSearch(
hosts=[{'host': 'your-opensearch-domain', 'port': 443}],
http_auth=('apikey', 'your-api-key'),
use_ssl=True,
verify_certs=True
)
- API keys should be stored securely (e.g., using environment variables or a secrets manager).
Sample OpenSearch Programs Using Python
This section provides sample programs demonstrating how to interact with OpenSearch using Python clients.
1. Bulk Indexing Multiple Documents Efficiently
from opensearchpy.helpers import bulk
docs = [
{"_index": "products", "_source": {"name": "Mouse", "price": 25.99, "in_stock": True}},
{"_index": "products", "_source": {"name": "Monitor", "price": 199.99, "in_stock": True}},
]
bulk(client, docs)
2. Running Aggregation Queries for Insights
query = {
"aggs": {
"average_price": {"avg": {"field": "price"}}
}
}
response = client.search(index="products", body=query)
print(response)
Wrapping Up
OpenSearch is a powerful tool for search and analytics, and Python makes it easy to work with. You should now be comfortable setting up OpenSearch with Python, indexing and searching data, and optimizing performance.