Learn Elasticsearch: Performance and Best Practices

Building a high-performance Elasticsearch cluster requires careful planning and optimization. In this article, we’ll explore essential strategies for sharding, replication, index management, and cluster optimization to ensure your Elasticsearch deployment is both fast and scalable.

Introduction

Performance optimization in Elasticsearch involves multiple aspects, from hardware configuration to index design and query optimization. Understanding these components and their interactions is crucial for building efficient search solutions.

Cluster Architecture

Node Types and Roles

PUT _cluster/settings
{
  "persistent": {
    "node.roles": ["data", "ingest", "master"]
  }
}

Recommended Node Configuration

Master Nodes
- Dedicated nodes for cluster management
- Minimum 3 nodes for high availability
- Moderate CPU and memory
Data Nodes
- High memory and storage capacity
- Multiple nodes for horizontal scaling
- Fast storage (SSD recommended)
Ingest Nodes
- High CPU for data processing
- Moderate memory requirements
- Optional for small clusters

Sharding Strategy

Shard Size Optimization

PUT /logs-2024.04.06
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1,
    "index.routing.allocation.total_shards_per_node": 3
  }
}

Best Practices for Sharding

Shard Size
- Target 20-50GB per shard
- Monitor shard size growth
- Adjust based on data volume
Shard Count
- Consider node capacity
- Plan for future growth
- Balance query performance

Replication Strategy

Replica Configuration

PUT /products
{
  "settings": {
    "number_of_replicas": 2,
    "index.routing.allocation.include._tier_preference": "data_hot,data_warm,data_cold"
  }
}

Replication Factors

Production: 1-2 replicas
High availability: 2-3 replicas
Disaster recovery: 3+ replicas

Index Lifecycle Management (ILM)

ILM Policy Example

PUT _ilm/policy/logs_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "7d"
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "forcemerge": {
            "max_num_segments": 1
          },
          "shrink": {
            "number_of_shards": 1
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "freeze": {}
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Index Settings Optimization

Performance Settings

PUT /high_performance_index
{
  "settings": {
    "index.refresh_interval": "30s",
    "index.translog.durability": "async",
    "index.translog.sync_interval": "5s",
    "index.merge.scheduler.max_thread_count": 1,
    "index.number_of_routing_shards": 30
  }
}

Memory Management

PUT _cluster/settings
{
  "persistent": {
    "indices.breaker.total.limit": "70%",
    "indices.breaker.fielddata.limit": "40%",
    "indices.breaker.request.limit": "60%"
  }
}

Query Optimization

Search Settings

PUT /search_optimized
{
  "settings": {
    "index.search.slowlog.threshold.query.warn": "10s",
    "index.search.slowlog.threshold.query.info": "5s",
    "index.search.slowlog.threshold.query.debug": "2s",
    "index.search.slowlog.threshold.query.trace": "500ms"
  }
}

Query Performance Tips

Filter Context

GET /products/_search
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "status": "active" } },
        { "range": { "price": { "gte": 100 } } }
      ]
    }
  }
}

Pagination Optimization

GET /products/_search
{
  "from": 0,
  "size": 10,
  "track_total_hits": false
}

Monitoring and Maintenance

Cluster Health Monitoring

GET _cluster/health?pretty
GET _nodes/stats?pretty
GET _cat/indices?v

Regular Maintenance Tasks

Force Merge

POST /logs-*/_forcemerge?max_num_segments=1

Cache Clear
```
POST /_cache/clear
```

Scaling Strategies

Horizontal Scaling

Add Data Nodes
- Monitor shard distribution
- Balance cluster load
- Update replica settings

Split Indices

POST /large_index/_split/split_index
{
  "settings": {
    "index.number_of_shards": 10
  }
}

Vertical Scaling

Memory Optimization
- JVM heap settings
- Field data cache
- Query cache
Storage Optimization
- Use SSDs
- RAID configuration
- File system settings

Best Practices

Index Design
- Use appropriate mappings
- Optimize field types
- Consider data lifecycle
Query Design
- Use filters effectively
- Optimize pagination
- Monitor slow queries
Cluster Management
- Regular monitoring
- Capacity planning
- Backup strategies

Common Issues and Solutions

Performance Problems

Monitor slow logs
Check resource usage
Optimize queries

Scaling Issues

Review shard strategy
Adjust node roles
Update cluster settings

Next Steps

After optimizing performance:

Implement monitoring
Set up alerts
Plan for growth
Document procedures

Conclusion

Building a high-performance Elasticsearch cluster requires:

Proper sharding strategy
Effective replication
Optimized index settings
Regular maintenance

Remember to:

Monitor performance metrics
Plan for scalability
Follow best practices
Document configurations

Stay tuned for our next article on Elasticsearch security and access control.

Introduction#

Cluster Architecture#

Node Types and Roles#

Recommended Node Configuration#

Sharding Strategy#

Shard Size Optimization#

Best Practices for Sharding#

Replication Strategy#

Replica Configuration#

Replication Factors#

Index Lifecycle Management (ILM)#

ILM Policy Example#

Index Settings Optimization#

Performance Settings#

Memory Management#

Query Optimization#

Search Settings#

Query Performance Tips#

Monitoring and Maintenance#

Cluster Health Monitoring#

Regular Maintenance Tasks#

Scaling Strategies#

Horizontal Scaling#

Vertical Scaling#

Best Practices#

Common Issues and Solutions#

Performance Problems#

Scaling Issues#

Next Steps#

Conclusion#