Báo cáo khoa học: Automatic Determination of Parts of Speech of English Words
Số trang: 15
Loại file: pdf
Dung lượng: 582.97 KB
Lượt xem: 14
Lượt tải: 0
Xem trước 2 trang đầu tiên của tài liệu này:
Thông tin tài liệu:
The classifying of words according to syntactic usage is basic to language handling; this paper describes an algorithm for automatically classifying words according to thirteen commonly used parts of speech: noun, adjective, verb, past verb, adverb, preposition, conjunction, pronoun, interjection, present participle, past participle, auxiliary verb, and plural or collective noun.
Nội dung trích xuất từ tài liệu:
Báo cáo khoa học: "Automatic Determination of Parts of Speech of English Words" [Mechanical Translation and Computational Linguistics, vol.10, nos.3/4, September and December 1967]Automatic Determination of Parts of Speech of English Wordsby Lois L. Earl,* Lockheed Palo Alto Research Laboratory, Palo Alto, California The classifying of words according to syntactic usage is basic to language handling; this paper describes an algorithm for automatically classifying words according to thirteen commonly used parts of speech: noun, adjective, verb, past verb, adverb, preposition, conjunction, pronoun, interjection, present participle, past participle, auxiliary verb, and plural or collective noun. The algorithm was derived by a computerized study of the words in The Shorter Oxford English Dictionary. In its operation it utilizes a prepared dictionary of around nine hundred words to assign parts of speech to special or exceptional words. Other words are split into affix and kernel parts and assigned a part of speech on the basis of the part-of-speech implications of the affixes and the length of the remaining kernel. An accuracy of 95 per cent is achieved from the point of view of inclusive part of speech, where inclusive part of speech is defined as that string which contains all the parts of speech attributed to the word by the dictionary but which may also contain one or two more parts of speech. formation from The Shorter Oxford English DictionaryIntroduction (SOX)2 and Websters Third New International Dic-This paper describes the development and details of tionary (MW3).3 The tape dictionary is reliable ina procedure for automatically assigning part-of-speech most respects, since it was made from punched cardscharacteristics to English words, largely from graphemic transcribed directly from the dictionaries, verified byconsiderations. The development of the algorithm began different personnel, and spot-checked periodically dur-with the observation of Dolby and Resnikoff1 that the ing the process. Nevertheless, errors did occur, par-parts of speech associated with one-syllable words are ticularly in the recording of part-of-speech informationfrequently noun (or noun and adjective) and verb, which was not always understood by the keypunchers.while the parts of speech associated with multisyllable The parts of speech recorded are as follows:words are usually noun and adjective only. Develop-ment of a working part-of-speech algorithm required Noun N Adverb AV Pronoun PNthe study of exceptions to this general rule so that Adjective AJ Preposition PR Interjection IJ Verb VB Conjunction CJ Past verb PVanalytical subrules and exception lists sufficient toidentify automatically all such exceptions could be In addition, the category other (OT) was used when-derived. Two analyses were utilized for the isolation ever the dictionary gave some part of speech otherand study of exceptions: (1) Exhaustive sorts of a than the nine listed above. Participles, numerals, arti-73,582-word dictionary on magnetic tape were used to cles, and collective nouns mainly comprise OT.separate words consistent with the general rule from The algorithm was designed to assign these samethose words that were not and to classify them. (2) nine parts of speech (excluding OT) with the additionComputer analysis of possible part-of-speech implica- of four more which were unfortunately subsumedtions of affixes was carried out on the same dictionary. ...
Nội dung trích xuất từ tài liệu:
Báo cáo khoa học: "Automatic Determination of Parts of Speech of English Words" [Mechanical Translation and Computational Linguistics, vol.10, nos.3/4, September and December 1967]Automatic Determination of Parts of Speech of English Wordsby Lois L. Earl,* Lockheed Palo Alto Research Laboratory, Palo Alto, California The classifying of words according to syntactic usage is basic to language handling; this paper describes an algorithm for automatically classifying words according to thirteen commonly used parts of speech: noun, adjective, verb, past verb, adverb, preposition, conjunction, pronoun, interjection, present participle, past participle, auxiliary verb, and plural or collective noun. The algorithm was derived by a computerized study of the words in The Shorter Oxford English Dictionary. In its operation it utilizes a prepared dictionary of around nine hundred words to assign parts of speech to special or exceptional words. Other words are split into affix and kernel parts and assigned a part of speech on the basis of the part-of-speech implications of the affixes and the length of the remaining kernel. An accuracy of 95 per cent is achieved from the point of view of inclusive part of speech, where inclusive part of speech is defined as that string which contains all the parts of speech attributed to the word by the dictionary but which may also contain one or two more parts of speech. formation from The Shorter Oxford English DictionaryIntroduction (SOX)2 and Websters Third New International Dic-This paper describes the development and details of tionary (MW3).3 The tape dictionary is reliable ina procedure for automatically assigning part-of-speech most respects, since it was made from punched cardscharacteristics to English words, largely from graphemic transcribed directly from the dictionaries, verified byconsiderations. The development of the algorithm began different personnel, and spot-checked periodically dur-with the observation of Dolby and Resnikoff1 that the ing the process. Nevertheless, errors did occur, par-parts of speech associated with one-syllable words are ticularly in the recording of part-of-speech informationfrequently noun (or noun and adjective) and verb, which was not always understood by the keypunchers.while the parts of speech associated with multisyllable The parts of speech recorded are as follows:words are usually noun and adjective only. Develop-ment of a working part-of-speech algorithm required Noun N Adverb AV Pronoun PNthe study of exceptions to this general rule so that Adjective AJ Preposition PR Interjection IJ Verb VB Conjunction CJ Past verb PVanalytical subrules and exception lists sufficient toidentify automatically all such exceptions could be In addition, the category other (OT) was used when-derived. Two analyses were utilized for the isolation ever the dictionary gave some part of speech otherand study of exceptions: (1) Exhaustive sorts of a than the nine listed above. Participles, numerals, arti-73,582-word dictionary on magnetic tape were used to cles, and collective nouns mainly comprise OT.separate words consistent with the general rule from The algorithm was designed to assign these samethose words that were not and to classify them. (2) nine parts of speech (excluding OT) with the additionComputer analysis of possible part-of-speech implica- of four more which were unfortunately subsumedtions of affixes was carried out on the same dictionary. ...
Tìm kiếm theo từ khóa liên quan:
Automatic Determination of Parts of Speech of English Words Lois L. Earl Mechanical Translation báo cáo khoa học báo cáo ngôn ngữ ngôn ngữ tự nhiênGợi ý tài liệu liên quan:
-
63 trang 288 0 0
-
13 trang 262 0 0
-
Báo cáo khoa học Bước đầu tìm hiểu văn hóa ẩm thực Trà Vinh
61 trang 248 0 0 -
Tóm tắt luận án tiến sỹ Một số vấn đề tối ưu hóa và nâng cao hiệu quả trong xử lý thông tin hình ảnh
28 trang 217 0 0 -
Đề tài nghiên cứu khoa học và công nghệ cấp trường: Hệ thống giám sát báo trộm cho xe máy
63 trang 187 0 0 -
Đề tài nghiên cứu khoa học: Tội ác và hình phạt của Dostoevsky qua góc nhìn tâm lý học tội phạm
70 trang 185 0 0 -
NGHIÊN CỨU CHỌN TẠO CÁC GIỐNG LÚA CHẤT LƯỢNG CAO CHO VÙNG ĐỒNG BẰNG SÔNG CỬU LONG
9 trang 185 0 0 -
98 trang 170 0 0
-
96 trang 166 0 0
-
SỨC MẠNH CHÍNH TRỊ CỦA LIÊN MINH CHÂU ÂU TRÊN TRƯỜNG QUỐC TẾ
4 trang 164 0 0