Mapping and text analysis are fundamental concepts in Elasticsearch that determine how your data is processed, stored, and searched. In this article, we’ll explore how to define mappings and configure analyzers for optimal search results.
Introduction
Elasticsearch’s power comes from its ability to understand and process data intelligently. This intelligence is configured through mappings and analyzers, which define how your data is interpreted and indexed.
Understanding Mappings
Mappings define the schema for your documents, specifying how fields should be stored and indexed.
Basic Mapping Types
PUT /products
{
"mappings": {
"properties": {
"id": { "type": "keyword" },
"name": { "type": "text" },
"description": { "type": "text" },
"price": { "type": "float" },
"created_at": { "type": "date" },
"tags": { "type": "keyword" },
"in_stock": { "type": "boolean" }
}
}
}
Field Data Types
-
Text Fields
PUT /articles { "mappings": { "properties": { "title": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } }
-
Numeric Fields
PUT /metrics { "mappings": { "properties": { "cpu_usage": { "type": "float" }, "memory_total": { "type": "long" }, "request_count": { "type": "integer" }, "price": { "type": "scaled_float", "scaling_factor": 100 } } } }
Text Analysis Components
Analyzers
An analyzer consists of three components:
- Character filters
- Tokenizer
- Token filters
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"char_filter": ["html_strip"],
"tokenizer": "standard",
"filter": ["lowercase", "stop", "snowball"]
}
}
}
}
}
Built-in Analyzers
-
Standard Analyzer
POST _analyze { "analyzer": "standard", "text": "The quick brown fox jumps over the lazy dog!" }
-
Simple Analyzer
POST _analyze { "analyzer": "simple", "text": "The QUICK brown FOX!" }
Custom Analyzers
PUT /blog_posts
{
"settings": {
"analysis": {
"char_filter": {
"emoticons": {
"type": "mapping",
"mappings": [
":) => happy",
":( => sad"
]
}
},
"analyzer": {
"blog_analyzer": {
"type": "custom",
"char_filter": ["emoticons", "html_strip"],
"tokenizer": "standard",
"filter": [
"lowercase",
"stop",
"asciifolding"
]
}
}
}
}
}
Tokenizers
Types of Tokenizers
-
Standard Tokenizer
POST _analyze { "tokenizer": "standard", "text": "The quick.brown_fox jumped!" }
-
N-gram Tokenizer
PUT /autocomplete { "settings": { "analysis": { "analyzer": { "autocomplete": { "tokenizer": "ngram", "min_gram": 2, "max_gram": 10 } } } } }
Advanced Mapping Features
Multi-fields
PUT /users
{
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
},
"english": {
"type": "text",
"analyzer": "english"
}
}
}
}
}
}
Dynamic Mapping
PUT /dynamic_index
{
"mappings": {
"dynamic": "strict",
"properties": {
"title": { "type": "text" },
"created": { "type": "date" },
"metadata": {
"type": "object",
"dynamic": true
}
}
}
}
Mapping Parameters
Common Parameters
PUT /products
{
"mappings": {
"properties": {
"description": {
"type": "text",
"index": true,
"store": false,
"doc_values": true,
"null_value": "N/A",
"copy_to": "all_fields"
}
}
}
}
Use Cases and Examples
Search Optimization
-
Product Search
PUT /e-commerce { "mappings": { "properties": { "product_name": { "type": "text", "analyzer": "english", "fields": { "exact": { "type": "keyword" }, "suggest": { "type": "text", "analyzer": "simple" } } } } } }
-
Multi-language Support
PUT /multi_lang_posts { "mappings": { "properties": { "title": { "type": "text", "fields": { "english": { "type": "text", "analyzer": "english" }, "spanish": { "type": "text", "analyzer": "spanish" } } } } } }
Best Practices
-
Mapping Design
- Plan your mapping before indexing
- Use appropriate field types
- Consider future requirements
-
Analyzer Selection
- Choose analyzers based on use case
- Test analyzer output
- Consider language-specific needs
-
Performance Optimization
- Use doc_values appropriately
- Limit number of fields
- Monitor mapping size
Common Issues and Solutions
Mapping Explosion
- Set limits on field count
- Use nested objects carefully
- Monitor mapping size
Analysis Issues
- Test analyzers before deployment
- Use appropriate tokenizers
- Consider edge cases
Next Steps
After understanding mapping and analyzers:
- Learn about search templates
- Explore index aliases
- Implement reindexing strategies
- Master index lifecycle management
Conclusion
Proper mapping and text analysis are crucial for:
- Accurate search results
- Efficient data storage
- Optimal performance
- Flexible querying
Remember to:
- Plan your mappings carefully
- Test analyzers thoroughly
- Monitor performance
- Update mappings when needed
Stay tuned for our next article on search templates and index patterns in Elasticsearch.