| | 9.246874
=
2.2
×
1
×
0.66753393
×
6.2964954
9.246874 = 2.2\times1 \times 0.66753393\times6.2964954
9.246874=2.2×1×0.66753393×6.2964954</p></div></div></td></tr></tbody></table></div><figure class=""><span>9.246874 = 2.2\times1 \times 0.66753393\times6.2964954</span></figure><p>在search时,通过指定参数<code>explain=true</code>,即可在返回的<code>_explanation</code>字段内看到<code>max_score</code>的计算过程和中间结果:</p><div class="rno-markdown-code"><div class="rno-markdown-code-toolbar"><div class="rno-markdown-code-toolbar-info"><div class="rno-markdown-code-toolbar-item is-type"><span class="is-m-hidden">代码语言:</span>javascript</div></div><div class="rno-markdown-code-toolbar-opt"><div class="rno-markdown-code-toolbar-copy"><i class="icon-copy"></i><span class="is-m-hidden">复制</span></div></div></div><div class="developer-code-block"><pre class="prism-token token line-numbers language-javascript"><code class="language-javascript" style="margin-left:0">GET /test_index/_search?explain=true
{
"query": {
"match": {
"test_field": "测试用query"
}
}
} 上述示例查询结果如下: {
... # 省略其他字段
"_explanation" : {
"value" : 9.246874,
"description" : "sum of:",
"details" : [
{
"value" : 9.246874,
"description" : "weight(test_field:升级 in 398) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 9.246874,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 6.2964954,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 1,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 813,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.66753393,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 2.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 9.088561,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
}
}
下面我们来仔细研究一下这里面的每一项。 计算tf tf(Term Frequency,词频):搜索文本分词后各个词条(term)在被查询文档的相应字段中出现的频率,频率越大,相关性越高,得分就越高。
{
"value" : 0.66753393,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 2.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 9.088561,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
| | |
---|
| | | | | | | 长度规格化参数(平衡词条长度对于整个文档的影响程度) | | | | | | 查询出来的所有文档被字段分解长度总和/查询文档总数 | |
可以理解为自然语言处理中的tf做了一定程度的正则化。 计算idfidf(Inverse Document Frequency,逆文档频率):搜索文本中分词后各个词条(term)在整个索引的所有文档中出现的频率倒数,频率越大,频率倒数越小,相关性越低,得分就越低。 {
"value" : 6.2964954,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 1,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 813,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
}
同样也可以理解为自然语言处理中的idf做了一定程度的正则化。 boost查询权重boost 在同一个字段匹配多个词条时才有实际意义,它用来控制每个词条的计算相关度的权重。
示例查询: GET /test_index/_search?explain=true
{
"query": {
"bool": {
"should": [{
"match": {
"test_field": {
"query": "xxx",
"boost": 1
}
}
},
{
"match": {
"test_field": {
"query": "yyy",
"boost": 2
}
}
},
{
"match": {
"test_field": {
"query": "zzz",
"boost": 3
}
}
}
]
}
}
}
在上面的搜索计算相关度时,文档命中词条xxx 时指定boost=1 计算max_score ,命中命中词条yyy 时指定boost=2 计算max_score ,命中词条zzz 时指定boost=3 计算max_score 。 参考文献- ES系列–打分机制
|