Summary of Mathematics doctoral thesis: Some searching techniques for entities based on implicit semantic relations and context aware query suggestions

Số trang: 27 Loại file: pdf Dung lượng: 1.47 MB Lượt xem: 8 Lượt tải: 0

Hoai.2512

Phí tải xuống: 27,000 VND

Xem trước 3 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

The thesis researches and builds an entity search technique based on implicit semantic relations using clustering methods to improve search efficiency. Apply context-aware techniques, build an vertical search engine that applies context-aware in its own knowledge base domain (aviation data). Propose to measure combinatorial similarity in the contextual query suggestion problem to improve the quality of suggestion.
Nội dung trích xuất từ tài liệu:
Summary of Mathematics doctoral thesis: Some searching techniques for entities based on implicit semantic relations and context aware query suggestions MINISTRY OF EDUCATION AND VIETNAM ACADEMY TRAINING OF SCIENCE AND TECHNOLOGY GRADUATE UNIVERSITY SCIENCE AND TECHNOLOGY ---------------------------- Tran Lam QuanSOME SEARCHING TECHNIQUES FOR ENTITIES BASED ON IMPLICIT SEMANTIC RELATIONS AND CONTEXT-AWARE QUERY SUGGESTIONS Major: Mathematical Theory of Informatics Code: 9.46.01.10 SUMMARY OF MATHEMATICS DOCTORAL THESIS Hanoi - 2020Công trình được hoàn thành tại: Học viện Khoa học và Công nghệ - Viện Hàn lâm Khoa học và Công nghệ ViệtNam.Người hướng dẫn khoa học: TS. Vũ Tất ThắngPhản biện 1: …Phản biện 2: …Phản biện 3: ….Luận án sẽ được bảo vệ trước Hội đồng đánh giá luận án tiến sĩ cấp Học viện, họp tại Học viện Khoa học vàCông nghệ - Viện Hàn lâm Khoa học và Công nghệ Việt Nam vào hồi … giờ ..’, ngày … tháng … năm 202….Có thể tìm hiểu luận án tại:- Thư viện Học viện Khoa học và Công nghệ- Thư viện Quốc gia Việt Nam 1 INTRODUCTION1. The necessity of the thesis In the big data era, when the new data flow is generated incessantly, the search engine becomes a usefultool for the user to search for information. Based on the statistics, approximately 71% of the web searchingsentences includes the name of entities [7], [8]. When looking at the query only includes the entity name:Vietnam, Hanoi, France , in terms of visualization, we see the underlying semantics behind this query. Inother words, a similar relationship exists between the pair of entity names Vietnam: Hanoi and the pair ofentity names France: ?. If only considered visually, this is one of the natural abilities of human - the abilityto infer unknown information/knowledge by similar inference. With the above query, human have theability to give immediate answers, but theSearch Engine (SE) can only find thedocuments containing the aforementionedkeywords, the SE cannot immediately give theanswer Paris. The same happen in real world,there are questions as: If Fansipan is thehighest mountain in Vietnam, which one is thehighest in Tibet? or If you know Elizabeth asQueen of England, who is the Japanesemonarch?, etc. For queries with similarrelationships as above, the keyword searchengine has difficulty in giving answers whilehuman can easily make similar inferences. Figure 1.1: The list returns from Keyword-SE with query = Việt Nam, Hà Nội, Pháp. Researching and simulating ability of human to deduce from a familiar semantic domain (Vietnam,Hanoi) to an unfamiliar semantic domain (France, ?) - is the purpose of the first problem. The second problem about query suggestions. Also according to statistics, the queries of user toenter are often short, ambiguous, and poly-semantic [1-6]. In search sessions, the number of resultsreturned a lot, but most of them are not suitable for the users search intent1. Therefore, there are manyresearching directions set out to improve results and assist searchers. These researching directionsinclude: query suggestion, rewriting queries, query expansion, personalized recommendations,ranking/re-ranking search results, etc. The researching direction suggests that the query often applies traditional techniques such as clustering,similarity measurement, etc. of queries [9], [10]. However, traditional techniques have three disadvantages: First,it can only give similar suggestion or related to the query that is recently entered (current query) - but the qualityis not sure and better than the current query. Second, it is not possible to give the trend that most knowledge oftenasks after the current query. Third, these approaches do not seamlessly consider the users query to capture theusers search intent. For example, on the keyword SE, type 2 consecutive queries q1: Who is Joe Biden, q2:1 https://static.googleusercontent.com/media/guidelines.raterhub.com/en//searchqualityevaluatorguidelines.pdf 2How old is he, q1, q2 are semantically related. However, the results returned for q1, q2 are 2 very different setof the result. This shows the disadvantage of keyword search. Figure 1.2: The answers list from SE corresponding to q1 and q2. Capturing a seamless query string, in other words, capturing the search context, SE will understand theusers search intent. Moreover, capturing query string, SE can suggest string query, this suggestion str ...