正在加载,请稍候…

Elasticsearch 全文搜索:映射、查询与相关性调优

使用 Elasticsearch 构建强大的搜索功能——涵盖索引映射、查询 DSL、布尔查询、聚合、高亮、模糊搜索及性能优化。

Elasticsearch Full-Text Search: Mappings, Queries, and Relevance Tuning

Elasticsearch 核心概念

  • Index(索引):文档的集合(类似数据库表)
  • Document(文档):JSON 对象(类似行)
  • Mapping(映射):模式定义
  • Shard(分片):用于水平扩展的分区
  • Replica(副本):分片的副本,用于高可用

Elasticsearch Full-Text Search: Mappings, Queries, and Relevance Tuning illustration

索引映射

PUT /products
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "analysis": {
      "analyzer": {
        "product_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "stop", "snowball"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "id": { "type": "keyword" },
      "name": {
        "type": "text",
        "analyzer": "product_analyzer",
        "fields": {
          "keyword": { "type": "keyword" },
          "suggest": { "type": "completion" }
        }
      },
      "description": { "type": "text", "analyzer": "product_analyzer" },
      "price": { "type": "double" },
      "category": { "type": "keyword" },
      "tags": { "type": "keyword" },
      "rating": { "type": "float" },
      "in_stock": { "type": "boolean" },
      "created_at": { "type": "date" }
    }
  }
}

搜索查询

const { Client } = require('@elastic/elasticsearch')
const client = new Client({ node: 'http://localhost:9200' })

// 带过滤器的全文搜索
const result = await client.search({
  index: 'products',
  body: {
    query: {
      bool: {
        must: [
          {
            multi_match: {
              query: 'wireless headphones',
              fields: ['name^3', 'description'],  // name 权重为 3 倍
              type: 'best_fields',
              fuzziness: 'AUTO',
            },
          },
        ],
        filter: [
          { term: { in_stock: true } },
          { range: { price: { gte: 20, lte: 200 } } },
          { terms: { category: ['electronics', 'audio'] } },
        ],
        should: [
          { range: { rating: { gte: 4.0, boost: 2 } } },
        ],
      },
    },
    sort: [
      { _score: 'desc' },
      { rating: 'desc' },
    ],
    highlight: {
      fields: {
        name: { pre_tags: ['<mark>'], post_tags: ['</mark>'] },
        description: { fragment_size: 150, number_of_fragments: 2 },
      },
    },
    aggs: {
      categories: { terms: { field: 'category', size: 10 } },
      price_ranges: {
        range: {
          field: 'price',
          ranges: [
            { to: 25 }, { from: 25, to: 50 },
            { from: 50, to: 100 }, { from: 100 },
          ],
        },
      },
      avg_rating: { avg: { field: 'rating' } },
    },
    from: 0,
    size: 20,
  },
})

Elasticsearch Full-Text Search: Mappings, Queries, and Relevance Tuning illustration

使用 Completion Suggester 实现自动补全

// 索引包含 suggest 字段的文档
await client.index({
  index: 'products',
  body: {
    name: 'Wireless Headphones',
    'name.suggest': {
      input: ['Wireless Headphones', 'Headphones Wireless', 'BT Headphones'],
      weight: 10,
    },
  },
})

// 查询自动补全
const suggestions = await client.search({
  index: 'products',
  body: {
    suggest: {
      product_suggest: {
        prefix: 'wire',
        completion: {
          field: 'name.suggest',
          size: 5,
          fuzzy: { fuzziness: 1 },
        },
      },
    },
  },
})

用于分面搜索的聚合

// 获取搜索结果的分面
const facets = await client.search({
  index: 'products',
  body: {
    size: 0,  // 不需要文档,只需要聚合结果
    aggs: {
      categories: { terms: { field: 'category', size: 20 } },
      brands: { terms: { field: 'brand', size: 20 } },
      price_stats: { stats: { field: 'price' } },
      rating_histogram: {
        histogram: { field: 'rating', interval: 0.5 },
      },
    },
  },
})

Elasticsearch Full-Text Search: Mappings, Queries, and Relevance Tuning illustration

索引模板与别名

// 时间序列索引的模板
PUT /_index_template/logs
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": { "number_of_shards": 1 },
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" },
        "level": { "type": "keyword" },
        "message": { "type": "text" },
        "service": { "type": "keyword" }
      }
    }
  }
}

// 写入别名,实现无停机切换索引
POST /_aliases
{
  "actions": [
    { "add": { "index": "products-v2", "alias": "products" } },
    { "remove": { "index": "products-v1", "alias": "products" } }
  ]
}

性能优化建议

  • 对于不用于聚合的 text 字段,设置 "doc_values": false
  • 对于不计分的检查,使用 filter 上下文而非 query
  • 对于 ID、类别(精确匹配)使用 keyword 类型
  • 批量索引期间设置 refresh_interval: -1,完成后恢复
  • 使用 bulk API 进行索引(避免单个请求)