掘金 后端 ( ) • 2021-07-09 11:57
.markdown-body{word-break:break-word;line-height:1.75;font-weight:400;font-size:15px;overflow-x:hidden;color:#333}.markdown-body h1,.markdown-body h2,.markdown-body h3,.markdown-body h4,.markdown-body h5,.markdown-body h6{line-height:1.5;margin-top:35px;margin-bottom:10px;padding-bottom:5px}.markdown-body h1{font-size:30px;margin-bottom:5px}.markdown-body h2{padding-bottom:12px;font-size:24px;border-bottom:1px solid #ececec}.markdown-body h3{font-size:18px;padding-bottom:0}.markdown-body h4{font-size:16px}.markdown-body h5{font-size:15px}.markdown-body h6{margin-top:5px}.markdown-body p{line-height:inherit;margin-top:22px;margin-bottom:22px}.markdown-body img{max-width:100%}.markdown-body hr{border:none;border-top:1px solid #ddd;margin-top:32px;margin-bottom:32px}.markdown-body code{word-break:break-word;border-radius:2px;overflow-x:auto;background-color:#fff5f5;color:#ff502c;font-size:.87em;padding:.065em .4em}.markdown-body code,.markdown-body pre{font-family:Menlo,Monaco,Consolas,Courier New,monospace}.markdown-body pre{overflow:auto;position:relative;line-height:1.75}.markdown-body pre>code{font-size:12px;padding:15px 12px;margin:0;word-break:normal;display:block;overflow-x:auto;color:#333;background:#f8f8f8}.markdown-body a{text-decoration:none;color:#0269c8;border-bottom:1px solid #d1e9ff}.markdown-body a:active,.markdown-body a:hover{color:#275b8c}.markdown-body table{display:inline-block!important;font-size:12px;width:auto;max-width:100%;overflow:auto;border:1px solid #f6f6f6}.markdown-body thead{background:#f6f6f6;color:#000;text-align:left}.markdown-body tr:nth-child(2n){background-color:#fcfcfc}.markdown-body td,.markdown-body th{padding:12px 7px;line-height:24px}.markdown-body td{min-width:120px}.markdown-body blockquote{color:#666;padding:1px 23px;margin:22px 0;border-left:4px solid #cbcbcb;background-color:#f8f8f8}.markdown-body blockquote:after{display:block;content:""}.markdown-body blockquote>p{margin:10px 0}.markdown-body ol,.markdown-body ul{padding-left:28px}.markdown-body ol li,.markdown-body ul li{margin-bottom:0;list-style:inherit}.markdown-body ol li .task-list-item,.markdown-body ul li .task-list-item{list-style:none}.markdown-body ol li .task-list-item ol,.markdown-body ol li .task-list-item ul,.markdown-body ul li .task-list-item ol,.markdown-body ul li .task-list-item ul{margin-top:0}.markdown-body ol ol,.markdown-body ol ul,.markdown-body ul ol,.markdown-body ul ul{margin-top:3px}.markdown-body ol li{padding-left:6px}.markdown-body .contains-task-list{padding-left:0}.markdown-body .task-list-item{list-style:none}@media (max-width:720px){.markdown-body h1{font-size:24px}.markdown-body h2{font-size:20px}.markdown-body h3{font-size:18px}}

使用mongoDB做全文搜索

使用mongo做全文搜索使用它的 text 类型的索引实现的

需要先创建表

db.createCollection("search_data")
复制代码

在这个表中设计三个字段

{
    "_id": ObjectId("60e6a2c3ee1a3f20f6f21971"),
    "eventid": NumberLong("610527999155245056"),
    "event_content": "日常巡查时发现小区道路上有居民乱停放面包车。",
    "search_text": "日常 巡查 时 发现 小区 道路 上 有 居民 乱 停放 面包车 。",
}
把event_content字段用中文分词器分词后存入search_text字段
数据里的空格,对于搜索来说很重要
复制代码

创建索引

db.syzh_event.createIndex({search_text: "text"})
复制代码

插入数据,然后使用索引搜索数据

db.syzh_event.find(
    {$text: { $search: '红 草' }},
    {score:{$meta:'textScore'}}
).sort(
{score:{ $meta: "textScore"}}
)
例子中的"红草" 会被分成两个字,然后进行搜索,
其中一个字命中就会得0.5分,两个字命中就是1.0分,如果字数多的话,按分词效果,类推
复制代码

java 中文分词器


    org.ansj
    ansj_seg
    5.1.6


String searchText = IndexAnalysis.parse(search).toStringWithOutNature(" ");
IndexAnalysis是针对索引分词的方法
其余方法,可以自行百度,或者去 github搜索其开源仓库,查看源码
复制代码

java 实现搜索语法

mongo依赖是spring boot官方的

    org.springframework.boot
    spring-boot-starter-data-mongodb

复制代码

java代码:

实体类:

public class SearchEventPojo {
    private String id;
    private Long eventid;
    private String event_content;
    private String search_text;
    private Double score;
// 自行 getter和setter
}
复制代码

接口:

@Autowired
private MongoTemplate mongoTemplate;

@PostMapping("seacherData")
public Map seacherData(@RequestBody JSONObject req){
    String search = req.getString("search");
    // 写入数据时要分词,查询时也要分词后再查询
    String searchText = IndexAnalysis.parse(search).toStringWithOutNature(" ");
    TextQuery textQuery = new TextQuery(new TextCriteria().matching(searchText));
    textQuery.includeScore(); // 这里获取数据后就会映射到score属性上了
    textQuery.sortByScore();
    List syzh_eventList = mongoTemplate.find(textQuery, SearchEventPojo.class, "syzh_event");
    if (ListUtils.isNotBlank(syzh_eventList)) {
    Map map = new LinkedHashMap<>();
    map.put("count",syzh_eventList.size());
    map.put("data",syzh_eventList);
    return map;
}

@PostMapping("insertTestData")
public void insertTestData(@RequestBody JSONObject req){
    SearchEventPojo searchEventPojo = JSONUtils.toBean(req, SearchEventPojo.class);
    /* 开始分词  */
    String text = IndexAnalysis.parse(searchEventPojo.getEvent_content()).toStringWithOutNature(" "); // 这里的空格很重要,全靠这空格实现mongo的中文分词搜索了,(夸张了一些)
    searchEventPojo.setSearch_text(text);
    searchEventPojo.setEventid(SFlakeUtils.getUUID()); // 自己随意赋值即可
    mongoTemplate.insert(searchEventPojo, "syzh_event");
    // 自行处理异常
}
复制代码