Báo cáo khoa học: MARS: Multilingual Access and Retrieval System with Enhanced Query Translation and Document Retrieval
Số trang: 4
Loại file: pdf
Dung lượng: 323.85 KB
Lượt xem: 9
Lượt tải: 0
Xem trước 2 trang đầu tiên của tài liệu này:
Thông tin tài liệu:
In this paper, we introduce a multilingual access and retrieval system with enhanced query translation and multilingual document retrieval, by mining bilingual terminologies and aligned document directly from the set of comparable corpora which are to be searched upon by users. By extracting bilingual terminologies and aligning bilingual documents with similar content prior to the search process provide more accurate translated terms for the in-domain data and support multilingual retrieval even without the use of translation tool during retrieval time....
Nội dung trích xuất từ tài liệu:
Báo cáo khoa học: "MARS: Multilingual Access and Retrieval System with Enhanced Query Translation and Document Retrieval" MARS: Multilingual Access and Retrieval System with Enhanced Query Translation and Document Retrieval Lianhau Lee, Aiti Aw, Thuy Vu, Sharifah Aljunied Mahani, Min Zhang, Haizhou Li Institute for Infocomm Research 1 Fusionopolis Way, #21-01 Connexis, Singapore 138632 {lhlee, aaiti, tvu, smaljunied, mzhang, hli} @i2r.a-star.edu.sg In this paper, we introduce our Multilingual Abstract Access and Retrieval System – MARS which addresses the query translation issue by using in- In this paper, we introduce a multilingual ac- domain bilingual terminologies extracted directly cess and retrieval system with enhanced query from the comparable corpora which are to be translation and multilingual document retrieval, accessed by users. And at the same time, bilin- by mining bilingual terminologies and aligned gual documents are paired up prior to the search document directly from the set of comparable process based on their content similarities to corpora which are to be searched upon by us- ers. By extracting bilingual terminologies and overcome the limitation of traditional keyword aligning bilingual documents with similar con- matching based on the translated terms. These tent prior to the search process provide more would provide better retrieval experiences as not accurate translated terms for the in-domain only more accurate in-domain translated term data and support multilingual retrieval even will be used to retrieve the documents but also without the use of translation tool during re- provide a new perspective of multilingual infor- trieval time. This system includes a user- mation retrieval to process the time-consuming friendly graphical user interface designed to multilingual document matching at the backend. provide navigation and retrieval of information The following sections of this paper will de- in browse mode and search mode respectively. scribe the system architecture and the proposed functionalities of the MARS system.1 IntroductionQuery translation is an important step in the 2 MARS Systemcross-language information retrieval (CLIR). The MARS system is designed to enhance queryCurrently, most of the CLIR system relies on translation and document retrieval through min-various kinds of dictionaries, for example Word- ing the underlying multilingual structures ofNets (Luca and Nurnberger, 2006; Ranieri et al., comparable corpora via a pivot language. There2004), in query translation. Although dictionaries are three reasons for using a pivot language.can provide effective translation on common Firstly, it is appropriate to use a universal lan-words or even phrases, they are always limited in guage among potential users of different nativethe coverage. Hence, there is a need to expand languages. Secondly, it reduces the backend datathe existing collections of bilingual terminologies processing cost by just considering the pair-wisethrough various means. relationship between the pivot language and any Recently, there has been more and more re- other languages. Lastly, the dictionary resourcessearch work focus on bilingual terminology ex- between the pivot language and all the other lan-traction from comparable corpora. Some promis- guages are more likely to be available than oth-ing results have been reported making use of sta- erwise.tistics, linguistics (Sadat et al., 2003), translitera- There are two main parts in this system,tion (Udupa et al., 2008), date information (Tao namely data processing and user interface. Theand Zhai, 2005) and document alignment ap- data processing is an offline process to mine theproach (Talvensaari et al., 2007). underlying multilingual structure of the compa- 21 Proceedings of ...
Nội dung trích xuất từ tài liệu:
Báo cáo khoa học: "MARS: Multilingual Access and Retrieval System with Enhanced Query Translation and Document Retrieval" MARS: Multilingual Access and Retrieval System with Enhanced Query Translation and Document Retrieval Lianhau Lee, Aiti Aw, Thuy Vu, Sharifah Aljunied Mahani, Min Zhang, Haizhou Li Institute for Infocomm Research 1 Fusionopolis Way, #21-01 Connexis, Singapore 138632 {lhlee, aaiti, tvu, smaljunied, mzhang, hli} @i2r.a-star.edu.sg In this paper, we introduce our Multilingual Abstract Access and Retrieval System – MARS which addresses the query translation issue by using in- In this paper, we introduce a multilingual ac- domain bilingual terminologies extracted directly cess and retrieval system with enhanced query from the comparable corpora which are to be translation and multilingual document retrieval, accessed by users. And at the same time, bilin- by mining bilingual terminologies and aligned gual documents are paired up prior to the search document directly from the set of comparable process based on their content similarities to corpora which are to be searched upon by us- ers. By extracting bilingual terminologies and overcome the limitation of traditional keyword aligning bilingual documents with similar con- matching based on the translated terms. These tent prior to the search process provide more would provide better retrieval experiences as not accurate translated terms for the in-domain only more accurate in-domain translated term data and support multilingual retrieval even will be used to retrieve the documents but also without the use of translation tool during re- provide a new perspective of multilingual infor- trieval time. This system includes a user- mation retrieval to process the time-consuming friendly graphical user interface designed to multilingual document matching at the backend. provide navigation and retrieval of information The following sections of this paper will de- in browse mode and search mode respectively. scribe the system architecture and the proposed functionalities of the MARS system.1 IntroductionQuery translation is an important step in the 2 MARS Systemcross-language information retrieval (CLIR). The MARS system is designed to enhance queryCurrently, most of the CLIR system relies on translation and document retrieval through min-various kinds of dictionaries, for example Word- ing the underlying multilingual structures ofNets (Luca and Nurnberger, 2006; Ranieri et al., comparable corpora via a pivot language. There2004), in query translation. Although dictionaries are three reasons for using a pivot language.can provide effective translation on common Firstly, it is appropriate to use a universal lan-words or even phrases, they are always limited in guage among potential users of different nativethe coverage. Hence, there is a need to expand languages. Secondly, it reduces the backend datathe existing collections of bilingual terminologies processing cost by just considering the pair-wisethrough various means. relationship between the pivot language and any Recently, there has been more and more re- other languages. Lastly, the dictionary resourcessearch work focus on bilingual terminology ex- between the pivot language and all the other lan-traction from comparable corpora. Some promis- guages are more likely to be available than oth-ing results have been reported making use of sta- erwise.tistics, linguistics (Sadat et al., 2003), translitera- There are two main parts in this system,tion (Udupa et al., 2008), date information (Tao namely data processing and user interface. Theand Zhai, 2005) and document alignment ap- data processing is an offline process to mine theproach (Talvensaari et al., 2007). underlying multilingual structure of the compa- 21 Proceedings of ...
Tìm kiếm theo từ khóa liên quan:
Multilingual Access and Retrieval System Enhanced Query Translation Document Retrieval báo cáo khoa học báo cáo ngôn ngữ xử lý ngôn ngữ tự nhiênGợi ý tài liệu liên quan:
-
12 trang 292 0 0
-
63 trang 290 0 0
-
13 trang 262 0 0
-
Phương pháp tạo ra văn bản tiếng Việt có đề tài xác định
7 trang 251 0 0 -
Báo cáo khoa học Bước đầu tìm hiểu văn hóa ẩm thực Trà Vinh
61 trang 250 0 0 -
Tóm tắt luận án tiến sỹ Một số vấn đề tối ưu hóa và nâng cao hiệu quả trong xử lý thông tin hình ảnh
28 trang 218 0 0 -
Đề tài nghiên cứu khoa học và công nghệ cấp trường: Hệ thống giám sát báo trộm cho xe máy
63 trang 188 0 0 -
Đề tài nghiên cứu khoa học: Tội ác và hình phạt của Dostoevsky qua góc nhìn tâm lý học tội phạm
70 trang 188 0 0 -
NGHIÊN CỨU CHỌN TẠO CÁC GIỐNG LÚA CHẤT LƯỢNG CAO CHO VÙNG ĐỒNG BẰNG SÔNG CỬU LONG
9 trang 185 0 0 -
Giáo trình Lập trình logic trong prolog: Phần 1
114 trang 172 0 0