Mais conteúdo relacionado
Intro to bm25
- 5. document: d = (tf 1,...,tf |M1|)
query: q= (qtf 1,...,qtf |M2|)
log
http://en.wikipedia.org/wiki/Tf%E2%80%93idf
TF-IDF= tf(qt,d) × (N/n)
1. 重庆 1×0, 2×0
2. 老火锅 1×log2, 0×?
- 10. calculate_idf: IDF, avgdl, N, n(tf)
bm25cal: (query, document)=> BM25 value
bm25server:
(query, list<string>)=>list<double>
bm25seg
- 12. bm25cal < input_file
配置bm25.toml
servers = ["dataapp-seg01:9075","dataapp-seg02:9075"] #分词服务器字符串数组
idf = "idfs" # IDF 文件
port = "19090" # BM25值计算服务器端口
idfAvg = 0.3146483766 # IDF 平均值
separator = "t"
queryColumn = 1 #输入流中第一列为查询字符串
documentColumns = [2] #数字数组
avgLength = 26.74 #平均文档长度
k = 2.0
b = 0.75