Building a high-performance Elasticsearch cluster requires careful planning and optimization. In this article, we’ll explore essential strategies for sharding, replication, index management, and cluster optimization to ensure your Elasticsearch deployment is both fast and scalable.
Introduction
Performance optimization in Elasticsearch involves multiple aspects, from hardware configuration to index design and query optimization. Understanding these components and their interactions is crucial for building efficient search solutions.
Cluster Architecture
Node Types and Roles
PUT _cluster/settings
{
"persistent": {
"node.roles": ["data", "ingest", "master"]
}
}
Recommended Node Configuration
-
Master Nodes
- Dedicated nodes for cluster management
- Minimum 3 nodes for high availability
- Moderate CPU and memory
-
Data Nodes
- High memory and storage capacity
- Multiple nodes for horizontal scaling
- Fast storage (SSD recommended)
-
Ingest Nodes
- High CPU for data processing
- Moderate memory requirements
- Optional for small clusters
Sharding Strategy
Shard Size Optimization
PUT /logs-2024.04.06
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1,
"index.routing.allocation.total_shards_per_node": 3
}
}
Best Practices for Sharding
-
Shard Size
- Target 20-50GB per shard
- Monitor shard size growth
- Adjust based on data volume
-
Shard Count
- Consider node capacity
- Plan for future growth
- Balance query performance
Replication Strategy
Replica Configuration
PUT /products
{
"settings": {
"number_of_replicas": 2,
"index.routing.allocation.include._tier_preference": "data_hot,data_warm,data_cold"
}
}
Replication Factors
- Production: 1-2 replicas
- High availability: 2-3 replicas
- Disaster recovery: 3+ replicas
Index Lifecycle Management (ILM)
ILM Policy Example
PUT _ilm/policy/logs_policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50gb",
"max_age": "7d"
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"forcemerge": {
"max_num_segments": 1
},
"shrink": {
"number_of_shards": 1
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"freeze": {}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}
Index Settings Optimization
Performance Settings
PUT /high_performance_index
{
"settings": {
"index.refresh_interval": "30s",
"index.translog.durability": "async",
"index.translog.sync_interval": "5s",
"index.merge.scheduler.max_thread_count": 1,
"index.number_of_routing_shards": 30
}
}
Memory Management
PUT _cluster/settings
{
"persistent": {
"indices.breaker.total.limit": "70%",
"indices.breaker.fielddata.limit": "40%",
"indices.breaker.request.limit": "60%"
}
}
Query Optimization
Search Settings
PUT /search_optimized
{
"settings": {
"index.search.slowlog.threshold.query.warn": "10s",
"index.search.slowlog.threshold.query.info": "5s",
"index.search.slowlog.threshold.query.debug": "2s",
"index.search.slowlog.threshold.query.trace": "500ms"
}
}
Query Performance Tips
-
Filter Context
GET /products/_search { "query": { "bool": { "filter": [ { "term": { "status": "active" } }, { "range": { "price": { "gte": 100 } } } ] } } }
-
Pagination Optimization
GET /products/_search { "from": 0, "size": 10, "track_total_hits": false }
Monitoring and Maintenance
Cluster Health Monitoring
GET _cluster/health?pretty
GET _nodes/stats?pretty
GET _cat/indices?v
Regular Maintenance Tasks
-
Force Merge
POST /logs-*/_forcemerge?max_num_segments=1
-
Cache Clear
POST /_cache/clear
Scaling Strategies
Horizontal Scaling
-
Add Data Nodes
- Monitor shard distribution
- Balance cluster load
- Update replica settings
-
Split Indices
POST /large_index/_split/split_index { "settings": { "index.number_of_shards": 10 } }
Vertical Scaling
-
Memory Optimization
- JVM heap settings
- Field data cache
- Query cache
-
Storage Optimization
- Use SSDs
- RAID configuration
- File system settings
Best Practices
-
Index Design
- Use appropriate mappings
- Optimize field types
- Consider data lifecycle
-
Query Design
- Use filters effectively
- Optimize pagination
- Monitor slow queries
-
Cluster Management
- Regular monitoring
- Capacity planning
- Backup strategies
Common Issues and Solutions
Performance Problems
- Monitor slow logs
- Check resource usage
- Optimize queries
Scaling Issues
- Review shard strategy
- Adjust node roles
- Update cluster settings
Next Steps
After optimizing performance:
- Implement monitoring
- Set up alerts
- Plan for growth
- Document procedures
Conclusion
Building a high-performance Elasticsearch cluster requires:
- Proper sharding strategy
- Effective replication
- Optimized index settings
- Regular maintenance
Remember to:
- Monitor performance metrics
- Plan for scalability
- Follow best practices
- Document configurations
Stay tuned for our next article on Elasticsearch security and access control.