文档映射Mapping

Mapping类似数据库中的schema的定义,作用如下:

  • 定义索引中的字段的名称
  • 定义字段的数据类型,例如字符串,数字,布尔等
  • 字段,倒排索引的相关配置(Analyzer)

ES中Mapping映射可以分为动态映射和静态映射

动态映射: 在文档写入Elasticsearch时,会根据文档字段自动识别类型

静态映射: 在Elasticsearch中事先定义好映射,包含文档的各字段类型、分词器等

动态映射当类型如果设置不对时,会导致一些功能无法正常运行,例如Range查询

  • analyzer ik_max_word
  • dynamic true|false|strict
  • _reindex POST source|dest
  • _alias PUT /user2/_alias/user
  • index "index": false
  • index_options docs|freqs|positions|offsets
  • null_value "null_value": "NULL"
  • copy_to "copy_to": "full_address"
  • _bulk PUT /address/_bulk
  • index Template PUT /_template/template_test
  • Dynamic Template dynamic_templates
  • path_match/path_unmatch dynamic_templates.path_match
PUT /user
{
  "mappings": {
    "dynamic": "true/false/strict", //default true 一旦有新增字段的文档写入,Mapping 也同时被更新; false: Mapping 不会被更新,新增字段的数据无法被索引,但是信息会出现在_source中; strict(严格控制策略),文档写入失败,抛出异常
    "properties": {
      "properties" : {
        "province" : {
          "type" : "keyword",
          "copy_to": "full_address" //将字段的数值拷贝到目标字段,满足一些特定的搜索需求。copy_to的目标字段不出现在_source中
        },
        "city" : {
          "type" : "text",
          "copy_to": "full_address",
          "analyzer": "ik_max_word" //IK 分词
        },
      "name" : {
        "type" : "keyword",
        "null_value": "NULL" //需要对Null值进行搜索,只有keyword类型支持设计Null_Value
      },
      "address": {
        "type": "object",
        "dynamic": "true",
        "index": false, //index: 控制当前字段是否被索引,默认为true。如果设置为false,该字段不可被搜索
        "index_options": "offsets", //text类型默认记录postions,其他默认为 docs; docs 记录doc id; freqs:+term frequencies(词频); positions + term position; offsets + character offsets

      }
    }
  },
    "settings" : {
        "index" : {
            "analysis.analyzer.default.type": "ik_max_word"
        }
    }
}
POST _reindex
{
    "source": {
        "index": "user"
    },
    "dest": {
        "index": "user2"
    }
}
PUT /user2/_alias/user

思考:能否后期更改Mapping的字段类型?

  1. 新增加字段
    dynamic设为true时,一旦有新增字段的文档写入,Mapping 也同时被更新
    dynamic设为false,Mapping 不会被更新,新增字段的数据无法被索引,但是信息会出现在_source中
    dynamic设置成strict(严格控制策略),文档写入失败,抛出异常
  2. 对已有字段,一旦已经有数据写入,就不再支持修改字段定义
    Lucene 实现的倒排索引,一旦生成后,就不允许修改
    如果希望改变字段类型,可以利用 reindex API,重建索引
    具体方法:
  • 1)如果要推倒现有的映射, 你得重新建立一个静态索引
  • 2)然后把之前索引里的数据导入到新的索引里
  • 3)删除原创建的索引
  • 4)为新索引起个别名, 为原索引名

原因:
如果修改了字段的数据类型,会导致已被索引的数据无法被搜索
但是如果是增加新的字段,就不会有这样的影响

新增加字段 静态映射 demo dynamic: true

  1. 静态映射 设置dynamic=“strict”
PUT /user
{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "name": {
        "type": "text"
      },
      "address": {
        "type": "object",
        "dynamic": "true"
      }
    }
  }
}
  1. 新增字段Age
    params
PUT /user/_doc/1
{
  "name":"fox",
  "age":32,
  "address":{
    "province":"湖南",
    "city":"长沙"
  }
}

response

{
  "error" : {
    "root_cause" : [
      {
        "type" : "strict_dynamic_mapping_exception",
        "reason" : "mapping set to strict, dynamic introduction of [age] within [_doc] is not allowed"
      }
    ],
    "type" : "strict_dynamic_mapping_exception",
    "reason" : "mapping set to strict, dynamic introduction of [age] within [_doc] is not allowed"
  },
  "status" : 400
}
  1. 修改daynamic=true
PUT /user/_mapping
{
  "dynamic":true
}

对已有字段 修改 demo

具体方法:

  • 1)如果要推倒现有的映射, 你得重新建立一个静态索引
  • 2)然后把之前索引里的数据导入到新的索引里
  • 3)删除原创建的索引
  • 4)为新索引起个别名, 为原索引名
PUT /user
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "address": {
        "type": "text"
      }
    }
  }
}


PUT /user/_doc/1
{
  "name":"fox",
  "age":32,
  "address": "测试地址"
}

get /user/_search
{
  "query": {
    "term": {
      "address": "测试"
    }
  }
}

response
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}
  1. 新建立一个静态索引
PUT /user2
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "address": {
        "type": "text",
        "analyzer": "ik_max_word"
      }
    }
  }
}
  1. 把之前索引里的数据导入到新的索引里
POST _reindex
{
    "source": {
        "index": "user"
    },
    "dest": {
        "index": "user2"
    }
}
  1. 查询新索引是可以查到的
get /user2/_search
{
  "query": {
    "term": {
      "address": "测试"
    }
  }
}

{
  "took" : 694,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "user2",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "fox",
          "age" : 32,
          "address" : "测试地址"
        }
      }
    ]
  }
}
  1. 删除原创建的索引
DELETE /user
  1. 为新索引起个别名, 为原索引名
PUT /user2/_alias/user

GET /user

get /user/_search
{
  "query": {
    "term": {
      "address": "测试"
    }
  }
}

注意: 通过这几个步骤就实现了索引的平滑过渡,并且是零停机

常用Mapping参数配置

1. index: 控制当前字段是否被索引,默认为true。如果设置为false,该字段不可被搜索
DELETE /user

PUT /user
{
  "mappings" : {
      "properties" : {
        "address" : {
          "type" : "text",
          "index": false
        },
        "age" : {
          "type" : "long"
        },
        "name" : {
          "type" : "text"
        }
      }
    }
}

PUT /user/_doc/1
{
  "name":"fox",
  "address":"广州白云山公园",
  "age":30
}


GET /user

GET /user/_search
{
  "query": {
    "match": {
      "address": "广州"
    }
  }
}

response
{
  "error" : {
    "root_cause" : [
      {
        "type" : "query_shard_exception",
        "reason" : "failed to create query: Cannot search on field [address] since it is not indexed.",
        "index_uuid" : "AlWZrE-XT4iwIJsd8V9IfQ",
        "index" : "user"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "user",
        "node" : "rEYg9XpfS_uCtGpHpeoSCw",
        "reason" : {
          "type" : "query_shard_exception",
          "reason" : "failed to create query: Cannot search on field [address] since it is not indexed.",
          "index_uuid" : "AlWZrE-XT4iwIJsd8V9IfQ",
          "index" : "user",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "Cannot search on field [address] since it is not indexed."
          }
        }
      }
    ]
  },
  "status" : 400
}
2.有四种不同基本的index options配置,控制倒排索引记录的内容
  • docs : 记录doc id
  • freqs:记录doc id 和term frequencies(词频)
  • positions: 记录doc id / term frequencies / term position
  • offsets: doc id / term frequencies / term posistion / character offsets

text类型默认记录postions,其他默认为 docs。记录内容越多,占用存储空间越大

DELETE /user

PUT /user
{
  "mappings" : {
      "properties" : {
        "address" : {
          "type" : "text",
          "index_options": "offsets"
        },
        "age" : {
          "type" : "long"
        },
        "name" : {
          "type" : "text"
        }
      }
    }
}
3.null_value: 需要对Null值进行搜索,只有keyword类型支持设计Null_Value
DELETE /user

PUT /user
{
  "mappings" : {
      "properties" : {
        "address" : {
          "type" : "keyword",
          "null_value": "NULL"
        },
        "age" : {
          "type" : "long"
        },
        "name" : {
          "type" : "text"
        }
      }
    }
}

PUT /user/_doc/1
{
  "name":"fox",
  "age":32,
  "address":null
}

GET /user/_search
{
  "query": {
    "match": {
      "address": "NULL"
    }
  }
}

4.copy_to设置:将字段的数值拷贝到目标字段,满足一些特定的搜索需求。copy_to的目标字段不出现在_source中

# 设置copy_to
DELETE /address
PUT /address
{
  "mappings" : {
      "properties" : {
        "province" : {
          "type" : "keyword",
          "copy_to": "full_address"
        },
        "city" : {
          "type" : "text",
          "copy_to": "full_address"
        }
      }
    },
    "settings" : {
        "index" : {
            "analysis.analyzer.default.type": "ik_max_word"
        }
    }
}

PUT /address/_bulk
{ "index": { "_id": "1"} }
{"province": "湖南","city": "长沙"}
{ "index": { "_id": "2"} }
{"province": "湖南","city": "常德"}
{ "index": { "_id": "3"} }
{"province": "广东","city": "广州"}
{ "index": { "_id": "4"} }
{"province": "湖南","city": "邵阳"}

GET /address/_search
{
  "query": {
    "match": {
      "full_address": {
        "query": "湖南常德",
        "operator": "and"
      }
    }
  }
}

5.Index Template

Index Templates可以帮助你设定Mappings和Settings,并按照一定的规则,自动匹配到新创建的索引之上

  • 模版仅在一个索引被新创建时,才会产生作用。修改模版不会影响已创建的索引
  • 你可以设定多个索引模版,这些设置会被“merge”在一起
  • 你可以指定“order”的数值,控制“merging”的过程
PUT /_template/template_default
{
  "index_patterns": ["*"],
  "order": 0,
  "version": 1,
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  }
}

PUT /_template/template_test
{
  "index_patterns": ["test*"],
  "order": 1,
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 1
  },
  "mappings": {
    "date_detection": false,
    "numeric_detection": true
  }
}

lndex Template的工作方式

当一个索引被新创建时:

  • 应用Elasticsearch 默认的settings 和mappings
  • 应用order数值低的lndex Template 中的设定
  • 应用order高的 Index Template 中的设定,之前的设定会被覆盖
  • 应用创建索引时,用户所指定的Settings和 Mappings,并覆盖之前模版中的设定
#查看template信息
GET /_template/template_default
GET /_template/temp*


PUT /testtemplate/_doc/1
{
  "orderNo": 1,
  "createDate": "2022/01/01"
}

GET /testtemplate/_mapping
GET /testtemplate/_settings


PUT /testmy
{
  "mappings": {
    "date_detection": true
  }
}

PUT /testmy/_doc/1
{
  "orderNo": 1,
  "createDate": "2022/01/01"
}

GET /testmy/_mapping

6.Dynamic Template

Dynamic Tempate定义在某个索引的Mapping中

#Dynaminc Mapping 根据类型和字段名
DELETE my_index
PUT my_index/_doc/1
{
  "firstName":"Ruan",
  "isVIP":"true"
}

GET my_index/_mapping
DELETE my_index
PUT my_index
{
  "mappings": {
    "dynamic_templates": [
            {
        "strings_as_boolean": {
          "match_mapping_type":   "string",
          "match":"is*",
          "mapping": {
            "type": "boolean"
          }
        }
      },
      {
        "strings_as_keywords": {
          "match_mapping_type":   "string",
          "mapping": {
            "type": "keyword"
          }
        }
      }
    ]
  }
}

7. 结合路径

PUT /my_test_index
{
  "mappings": {
    "dynamic_templates": [
      {
        "full_name":{
          "path_match": "name.*",
          "path_unmatch": "*.middle",
          "mapping":{
            "type": "text",
            "copy_to": "full_name"
          }
        }
      }
      ]
  }
}

PUT /my_test_index/_doc/1
{
  "name":{
    "first": "John",
    "middle": "Winston",
    "last": "Lennon"
  }
}


GET /my_test_index/_search
{
  "query": {
    "match": {
      "full_name": "John"
    }
  }
}