Báo cáo khoa học: Experiments in Semantic Classification
Số trang: 16
Loại file: pdf
Dung lượng: 262.04 KB
Lượt xem: 8
Lượt tải: 0
Xem trước 2 trang đầu tiên của tài liệu này:
Thông tin tài liệu:
It is argued that a thesaurus, or semantic classification, may be required in the resolution of multiple meaning for machine translation and allied purposes. The problem of constructing a thesaurus is then considered; this involves a method for defining the meanings or uses of words, and a procedure for classifying them.
Nội dung trích xuất từ tài liệu:
Báo cáo khoa học: "Experiments in Semantic Classification" [Mechanical Translation and Computational Linguistics, vol.8, nos.3 and 4, June and October 1965] Experiments in Semantic Classification by K. Sparck Jones, Cambridge Language Research Unit, Cambridge, England It is argued that a thesaurus, or semantic classification, may be required in the resolution of multiple meaning for machine translation and allied purposes. The problem of constructing a thesaurus is then considered; this involves a method for defining the meanings or uses of words, and a procedure for classifying them. It is suggested that word uses may be defined in terms of their semantic relations with other words, and that the classification may be based on these relations; the paper then shows how the uses of words may be defined by synonyms to give rows or sets of synonymous word uses, which can then be grouped by their common words, to give thesauric classes. A discussion of the role of synonymy in language is followed by an examination of the way in which multiple meaning may be resolved by the use of a thesaurus of the kind described.The work described below has arisen from the Cam- been thesaurically classified, we can resolve ambiguitybridge Language Research Unit’s original ideas about by looking for recurring heads. That is, we replace thethe use of a thesaurus for machine translation.1 Their words in a piece of discourse by the sets of heads de-argument, put simply, was that most words (and not fining the uses of each word, and we carry out a set-just some awkward words) have ranges of uses, or, as intersection procedure.it is sometimes put, have different meanings, or ex- Small-scale experiments on this basis were carriedpress different ideas, on different occasions. In dis- out in the C.L.R.U., using an existing thesaurus, thecourse, any individual word considered by itself is thus Penguin edition of the Roget’s Thesaurus of English Words and Phrases,2 published by Longmans. Thesepotentially ambiguous because it can be used in dif-ferent ways. This ambiguity is resolved, and the cor- experiments were only moderately successful, and itrect use of each word specified, by the surrounding was clear that this was due mainly to the defects of thecontext. This is because a piece of discourse is con- Thesaurus. A number of words did not occur in it atcerned with, or expresses, a particular idea or set of all, and others were under-classified, that is, they wererelated ideas. Discourse does not consist of a sequence not listed in enough heads to distinguish all their uses.of semantically unconnected sentences (it would be As it seemed that most existing thesauri would be in-very hard to understand if it did), but of sentences in adequate for the purpose of machine translation, thewhich the same key concepts are repeated. The appro- question of constructing a better thesaurus, specificallypriate uses of ambiguous words are therefore picked for machine translation, was considered. This wouldout because they express the idea or ideas that re- involvecur; or, to put it the other way round, the recurring i) better analysis of word usesidea or ideas specify the appropriate uses of ambigu- ii) checking the headings.ous words. The argument is therefore that discourseis essentially repetitive, because without repetitionthere would be too much ambiguity. The Problems of Thesaurus Construction This argument may be correct, but it is too vague as Much of the thesaurus research that has been carriedit stands; for machine translation something more defi- out in the C.L.R.U. has been concerned with thenite is required. It was therefore suggested that a pre- second problem, namely, with the investigation ofcise model of this situation could be constructed by ...
Nội dung trích xuất từ tài liệu:
Báo cáo khoa học: "Experiments in Semantic Classification" [Mechanical Translation and Computational Linguistics, vol.8, nos.3 and 4, June and October 1965] Experiments in Semantic Classification by K. Sparck Jones, Cambridge Language Research Unit, Cambridge, England It is argued that a thesaurus, or semantic classification, may be required in the resolution of multiple meaning for machine translation and allied purposes. The problem of constructing a thesaurus is then considered; this involves a method for defining the meanings or uses of words, and a procedure for classifying them. It is suggested that word uses may be defined in terms of their semantic relations with other words, and that the classification may be based on these relations; the paper then shows how the uses of words may be defined by synonyms to give rows or sets of synonymous word uses, which can then be grouped by their common words, to give thesauric classes. A discussion of the role of synonymy in language is followed by an examination of the way in which multiple meaning may be resolved by the use of a thesaurus of the kind described.The work described below has arisen from the Cam- been thesaurically classified, we can resolve ambiguitybridge Language Research Unit’s original ideas about by looking for recurring heads. That is, we replace thethe use of a thesaurus for machine translation.1 Their words in a piece of discourse by the sets of heads de-argument, put simply, was that most words (and not fining the uses of each word, and we carry out a set-just some awkward words) have ranges of uses, or, as intersection procedure.it is sometimes put, have different meanings, or ex- Small-scale experiments on this basis were carriedpress different ideas, on different occasions. In dis- out in the C.L.R.U., using an existing thesaurus, thecourse, any individual word considered by itself is thus Penguin edition of the Roget’s Thesaurus of English Words and Phrases,2 published by Longmans. Thesepotentially ambiguous because it can be used in dif-ferent ways. This ambiguity is resolved, and the cor- experiments were only moderately successful, and itrect use of each word specified, by the surrounding was clear that this was due mainly to the defects of thecontext. This is because a piece of discourse is con- Thesaurus. A number of words did not occur in it atcerned with, or expresses, a particular idea or set of all, and others were under-classified, that is, they wererelated ideas. Discourse does not consist of a sequence not listed in enough heads to distinguish all their uses.of semantically unconnected sentences (it would be As it seemed that most existing thesauri would be in-very hard to understand if it did), but of sentences in adequate for the purpose of machine translation, thewhich the same key concepts are repeated. The appro- question of constructing a better thesaurus, specificallypriate uses of ambiguous words are therefore picked for machine translation, was considered. This wouldout because they express the idea or ideas that re- involvecur; or, to put it the other way round, the recurring i) better analysis of word usesidea or ideas specify the appropriate uses of ambigu- ii) checking the headings.ous words. The argument is therefore that discourseis essentially repetitive, because without repetitionthere would be too much ambiguity. The Problems of Thesaurus Construction This argument may be correct, but it is too vague as Much of the thesaurus research that has been carriedit stands; for machine translation something more defi- out in the C.L.R.U. has been concerned with thenite is required. It was therefore suggested that a pre- second problem, namely, with the investigation ofcise model of this situation could be constructed by ...
Tìm kiếm theo từ khóa liên quan:
Experiments in Semantic Classification K. Sparck Jones Mechanical Translation báo cáo khoa học báo cáo ngôn ngữ ngôn ngữ tự nhiênGợi ý tài liệu liên quan:
-
63 trang 314 0 0
-
13 trang 264 0 0
-
Báo cáo khoa học Bước đầu tìm hiểu văn hóa ẩm thực Trà Vinh
61 trang 253 0 0 -
Tóm tắt luận án tiến sỹ Một số vấn đề tối ưu hóa và nâng cao hiệu quả trong xử lý thông tin hình ảnh
28 trang 222 0 0 -
Đề tài nghiên cứu khoa học và công nghệ cấp trường: Hệ thống giám sát báo trộm cho xe máy
63 trang 200 0 0 -
NGHIÊN CỨU CHỌN TẠO CÁC GIỐNG LÚA CHẤT LƯỢNG CAO CHO VÙNG ĐỒNG BẰNG SÔNG CỬU LONG
9 trang 199 0 0 -
Đề tài nghiên cứu khoa học: Tội ác và hình phạt của Dostoevsky qua góc nhìn tâm lý học tội phạm
70 trang 190 0 0 -
98 trang 171 0 0
-
96 trang 168 0 0
-
SỨC MẠNH CHÍNH TRỊ CỦA LIÊN MINH CHÂU ÂU TRÊN TRƯỜNG QUỐC TẾ
4 trang 167 0 0