Mapping and text analysis are fundamental concepts in Elasticsearch that determine how your data is processed, stored, and searched. In this article, we’ll explore how to define mappings and configure analyzers for optimal search results.

Introduction

Elasticsearch’s power comes from its ability to understand and process data intelligently. This intelligence is configured through mappings and analyzers, which define how your data is interpreted and indexed.

Understanding Mappings

Mappings define the schema for your documents, specifying how fields should be stored and indexed.

Basic Mapping Types

PUT /products
{
  "mappings": {
    "properties": {
      "id": { "type": "keyword" },
      "name": { "type": "text" },
      "description": { "type": "text" },
      "price": { "type": "float" },
      "created_at": { "type": "date" },
      "tags": { "type": "keyword" },
      "in_stock": { "type": "boolean" }
    }
  }
}

Field Data Types

  1. Text Fields

    PUT /articles
    {
      "mappings": {
        "properties": {
          "title": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
    
  2. Numeric Fields

    PUT /metrics
    {
      "mappings": {
        "properties": {
          "cpu_usage": { "type": "float" },
          "memory_total": { "type": "long" },
          "request_count": { "type": "integer" },
          "price": { "type": "scaled_float", "scaling_factor": 100 }
        }
      }
    }
    

Text Analysis Components

Analyzers

An analyzer consists of three components:

  1. Character filters
  2. Tokenizer
  3. Token filters
PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "char_filter": ["html_strip"],
          "tokenizer": "standard",
          "filter": ["lowercase", "stop", "snowball"]
        }
      }
    }
  }
}

Built-in Analyzers

  1. Standard Analyzer

    POST _analyze
    {
      "analyzer": "standard",
      "text": "The quick brown fox jumps over the lazy dog!"
    }
    
  2. Simple Analyzer

    POST _analyze
    {
      "analyzer": "simple",
      "text": "The QUICK brown FOX!"
    }
    

Custom Analyzers

PUT /blog_posts
{
  "settings": {
    "analysis": {
      "char_filter": {
        "emoticons": {
          "type": "mapping",
          "mappings": [
            ":) => happy",
            ":( => sad"
          ]
        }
      },
      "analyzer": {
        "blog_analyzer": {
          "type": "custom",
          "char_filter": ["emoticons", "html_strip"],
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "stop",
            "asciifolding"
          ]
        }
      }
    }
  }
}

Tokenizers

Types of Tokenizers

  1. Standard Tokenizer

    POST _analyze
    {
      "tokenizer": "standard",
      "text": "The quick.brown_fox jumped!"
    }
    
  2. N-gram Tokenizer

    PUT /autocomplete
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "autocomplete": {
              "tokenizer": "ngram",
              "min_gram": 2,
              "max_gram": 10
            }
          }
        }
      }
    }
    

Advanced Mapping Features

Multi-fields

PUT /users
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "raw": {
            "type": "keyword"
          },
          "english": {
            "type": "text",
            "analyzer": "english"
          }
        }
      }
    }
  }
}

Dynamic Mapping

PUT /dynamic_index
{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "title": { "type": "text" },
      "created": { "type": "date" },
      "metadata": {
        "type": "object",
        "dynamic": true
      }
    }
  }
}

Mapping Parameters

Common Parameters

PUT /products
{
  "mappings": {
    "properties": {
      "description": {
        "type": "text",
        "index": true,
        "store": false,
        "doc_values": true,
        "null_value": "N/A",
        "copy_to": "all_fields"
      }
    }
  }
}

Use Cases and Examples

Search Optimization

  1. Product Search

    PUT /e-commerce
    {
      "mappings": {
        "properties": {
          "product_name": {
            "type": "text",
            "analyzer": "english",
            "fields": {
              "exact": {
                "type": "keyword"
              },
              "suggest": {
                "type": "text",
                "analyzer": "simple"
              }
            }
          }
        }
      }
    }
    
  2. Multi-language Support

    PUT /multi_lang_posts
    {
      "mappings": {
        "properties": {
          "title": {
            "type": "text",
            "fields": {
              "english": {
                "type": "text",
                "analyzer": "english"
              },
              "spanish": {
                "type": "text",
                "analyzer": "spanish"
              }
            }
          }
        }
      }
    }
    

Best Practices

  1. Mapping Design

    • Plan your mapping before indexing
    • Use appropriate field types
    • Consider future requirements
  2. Analyzer Selection

    • Choose analyzers based on use case
    • Test analyzer output
    • Consider language-specific needs
  3. Performance Optimization

    • Use doc_values appropriately
    • Limit number of fields
    • Monitor mapping size

Common Issues and Solutions

Mapping Explosion

  • Set limits on field count
  • Use nested objects carefully
  • Monitor mapping size

Analysis Issues

  • Test analyzers before deployment
  • Use appropriate tokenizers
  • Consider edge cases

Next Steps

After understanding mapping and analyzers:

  1. Learn about search templates
  2. Explore index aliases
  3. Implement reindexing strategies
  4. Master index lifecycle management

Conclusion

Proper mapping and text analysis are crucial for:

  • Accurate search results
  • Efficient data storage
  • Optimal performance
  • Flexible querying

Remember to:

  • Plan your mappings carefully
  • Test analyzers thoroughly
  • Monitor performance
  • Update mappings when needed

Stay tuned for our next article on search templates and index patterns in Elasticsearch.