Danh mục

Báo cáo khoa học: MARS: Multilingual Access and Retrieval System with Enhanced Query Translation and Document Retrieval

Số trang: 4      Loại file: pdf      Dung lượng: 323.85 KB      Lượt xem: 9      Lượt tải: 0    
Hoai.2512

Phí lưu trữ: miễn phí Tải xuống file đầy đủ (4 trang) 0
Xem trước 2 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

In this paper, we introduce a multilingual access and retrieval system with enhanced query translation and multilingual document retrieval, by mining bilingual terminologies and aligned document directly from the set of comparable corpora which are to be searched upon by users. By extracting bilingual terminologies and aligning bilingual documents with similar content prior to the search process provide more accurate translated terms for the in-domain data and support multilingual retrieval even without the use of translation tool during retrieval time....
Nội dung trích xuất từ tài liệu:
Báo cáo khoa học: "MARS: Multilingual Access and Retrieval System with Enhanced Query Translation and Document Retrieval" MARS: Multilingual Access and Retrieval System with Enhanced Query Translation and Document Retrieval Lianhau Lee, Aiti Aw, Thuy Vu, Sharifah Aljunied Mahani, Min Zhang, Haizhou Li Institute for Infocomm Research 1 Fusionopolis Way, #21-01 Connexis, Singapore 138632 {lhlee, aaiti, tvu, smaljunied, mzhang, hli} @i2r.a-star.edu.sg In this paper, we introduce our Multilingual Abstract Access and Retrieval System – MARS which addresses the query translation issue by using in- In this paper, we introduce a multilingual ac- domain bilingual terminologies extracted directly cess and retrieval system with enhanced query from the comparable corpora which are to be translation and multilingual document retrieval, accessed by users. And at the same time, bilin- by mining bilingual terminologies and aligned gual documents are paired up prior to the search document directly from the set of comparable process based on their content similarities to corpora which are to be searched upon by us- ers. By extracting bilingual terminologies and overcome the limitation of traditional keyword aligning bilingual documents with similar con- matching based on the translated terms. These tent prior to the search process provide more would provide better retrieval experiences as not accurate translated terms for the in-domain only more accurate in-domain translated term data and support multilingual retrieval even will be used to retrieve the documents but also without the use of translation tool during re- provide a new perspective of multilingual infor- trieval time. This system includes a user- mation retrieval to process the time-consuming friendly graphical user interface designed to multilingual document matching at the backend. provide navigation and retrieval of information The following sections of this paper will de- in browse mode and search mode respectively. scribe the system architecture and the proposed functionalities of the MARS system.1 IntroductionQuery translation is an important step in the 2 MARS Systemcross-language information retrieval (CLIR). The MARS system is designed to enhance queryCurrently, most of the CLIR system relies on translation and document retrieval through min-various kinds of dictionaries, for example Word- ing the underlying multilingual structures ofNets (Luca and Nurnberger, 2006; Ranieri et al., comparable corpora via a pivot language. There2004), in query translation. Although dictionaries are three reasons for using a pivot language.can provide effective translation on common Firstly, it is appropriate to use a universal lan-words or even phrases, they are always limited in guage among potential users of different nativethe coverage. Hence, there is a need to expand languages. Secondly, it reduces the backend datathe existing collections of bilingual terminologies processing cost by just considering the pair-wisethrough various means. relationship between the pivot language and any Recently, there has been more and more re- other languages. Lastly, the dictionary resourcessearch work focus on bilingual terminology ex- between the pivot language and all the other lan-traction from comparable corpora. Some promis- guages are more likely to be available than oth-ing results have been reported making use of sta- erwise.tistics, linguistics (Sadat et al., 2003), translitera- There are two main parts in this system,tion (Udupa et al., 2008), date information (Tao namely data processing and user interface. Theand Zhai, 2005) and document alignment ap- data processing is an offline process to mine theproach (Talvensaari et al., 2007). underlying multilingual structure of the compa- 21 Proceedings of ...

Tài liệu được xem nhiều:

Gợi ý tài liệu liên quan: