Báo cáo khoa học: The Use of Statistics in Language Research
Số trang: 7
Loại file: pdf
Dung lượng: 149.07 KB
Lượt xem: 12
Lượt tải: 0
Xem trước 2 trang đầu tiên của tài liệu này:
Thông tin tài liệu:
The literature concerning the application of statistics to linguistic problems and in particular to mechanical translation is reviewed. The conclusion is that much of the work done is of little direct use for mechanical translation, and that some of it is based on a misapprehension of what statistical techniques can in fact do.
Nội dung trích xuất từ tài liệu:
Báo cáo khoa học: "The Use of Statistics in Language Research" [Mechanical Translation, vol.5, no.2, November 1958; pp. 67-73]The Use of Statistics in Language ResearchA. F. Parker-Rhodes, Cambridge Language Research Unit, Cambridge, England The literature concerning the application of statistics to linguistic problems and in particular to mechanical translation is reviewed. The conclusion is that much of the work done is of little direct use for mechanical translation, and that some of it is based on a misapprehension of what statistical techniques can in fact do. Statis- tical methods can play a useful part in the development of mechanical translation procedures once these have been well established, but have little to contribute at the present stage of the work. f ormation have a statistical aspect wheneverTHERE ARE many ways in which statistical code-compression is employed. 5) Crypto-techniques might be pressed into the service of graphy: a peripheral subject, but perhaps worthlanguage research, and in particular the theory inclusion.o f mechanical translation and information re-trieval. Most of these have had their advocates, Applications to LexicographyThe purpose of this paper is to review brieflythe literature of the subject, and to draw conclu- A good deal of theoretical work has been dones ions as to how much of this work can be re- on statistical techniques of a kind which couldg arded as a legitimate use of statistics, and as o r might be applied to the study of word fre-to how relevant it is to the progress of language- quency. The general problems are of a kind ofprocessing technology. There appear to be five main topics covered. frequent occurrence in biology, and so have received some attention from that quarter. OfFirst, I shall enumerate these, and then I shall this general kind is the work of Good.1 Morer efer seriatim to the works available in the specifically concerned with language problemsC.L.R.U. library upon each of them. 1) Lexi- are the contributions of Mandelbrot 2,3 oncography: this includes the methods and tech- word-frequencies. This author points out thatniques of compiling lexical information, whetherthis takes the form of a dictionary of a more or a knowledge of word-frequency distributions could be useful to the lexicographer, but he isl ess conventional character, or a thesaurus. not himself concerned to make this application.2) Approximative Methods: these are methods In fact, no one seems to have done so, excepto f machine translation which aim to rely on Koutsoudas,4 who in fact concludes that the so-keeping errors below a preconceived threshold c alled Zipf and Joos laws are insufficient toof tolerance; they use statistics mainly to pre-dict how little work need be done to achieve this. give reliable predictions of the size of diction- aries needed in machine translation, and con-3) Economics: included here are applications sequently recommends the accumulation ofo f statistics to ascertain the size of computers further empirical material with this end speci-needed, the time taken to operate programs, fically in view.etc. 4) Coding: the problems of coding of in- 3. B. Mandelbrot, Structure formelle des1. I. J. Good and G.H.Toulmin, The numberof new species and the population coverage, textes et communication, Word, 10, pp. 1-27when a sample is increased, Biometrika, 43, (1954).pp. 45-63 (1956). 4. A. M.Koutsou ...
Nội dung trích xuất từ tài liệu:
Báo cáo khoa học: "The Use of Statistics in Language Research" [Mechanical Translation, vol.5, no.2, November 1958; pp. 67-73]The Use of Statistics in Language ResearchA. F. Parker-Rhodes, Cambridge Language Research Unit, Cambridge, England The literature concerning the application of statistics to linguistic problems and in particular to mechanical translation is reviewed. The conclusion is that much of the work done is of little direct use for mechanical translation, and that some of it is based on a misapprehension of what statistical techniques can in fact do. Statis- tical methods can play a useful part in the development of mechanical translation procedures once these have been well established, but have little to contribute at the present stage of the work. f ormation have a statistical aspect wheneverTHERE ARE many ways in which statistical code-compression is employed. 5) Crypto-techniques might be pressed into the service of graphy: a peripheral subject, but perhaps worthlanguage research, and in particular the theory inclusion.o f mechanical translation and information re-trieval. Most of these have had their advocates, Applications to LexicographyThe purpose of this paper is to review brieflythe literature of the subject, and to draw conclu- A good deal of theoretical work has been dones ions as to how much of this work can be re- on statistical techniques of a kind which couldg arded as a legitimate use of statistics, and as o r might be applied to the study of word fre-to how relevant it is to the progress of language- quency. The general problems are of a kind ofprocessing technology. There appear to be five main topics covered. frequent occurrence in biology, and so have received some attention from that quarter. OfFirst, I shall enumerate these, and then I shall this general kind is the work of Good.1 Morer efer seriatim to the works available in the specifically concerned with language problemsC.L.R.U. library upon each of them. 1) Lexi- are the contributions of Mandelbrot 2,3 oncography: this includes the methods and tech- word-frequencies. This author points out thatniques of compiling lexical information, whetherthis takes the form of a dictionary of a more or a knowledge of word-frequency distributions could be useful to the lexicographer, but he isl ess conventional character, or a thesaurus. not himself concerned to make this application.2) Approximative Methods: these are methods In fact, no one seems to have done so, excepto f machine translation which aim to rely on Koutsoudas,4 who in fact concludes that the so-keeping errors below a preconceived threshold c alled Zipf and Joos laws are insufficient toof tolerance; they use statistics mainly to pre-dict how little work need be done to achieve this. give reliable predictions of the size of diction- aries needed in machine translation, and con-3) Economics: included here are applications sequently recommends the accumulation ofo f statistics to ascertain the size of computers further empirical material with this end speci-needed, the time taken to operate programs, fically in view.etc. 4) Coding: the problems of coding of in- 3. B. Mandelbrot, Structure formelle des1. I. J. Good and G.H.Toulmin, The numberof new species and the population coverage, textes et communication, Word, 10, pp. 1-27when a sample is increased, Biometrika, 43, (1954).pp. 45-63 (1956). 4. A. M.Koutsou ...
Tìm kiếm theo từ khóa liên quan:
The Use of Statistics in Language Research A. F. Parker-Rhodes Mechanical Translation báo cáo khoa học báo cáo ngôn ngữ ngôn ngữ tự nhiênGợi ý tài liệu liên quan:
-
63 trang 315 0 0
-
13 trang 265 0 0
-
Báo cáo khoa học Bước đầu tìm hiểu văn hóa ẩm thực Trà Vinh
61 trang 253 0 0 -
Tóm tắt luận án tiến sỹ Một số vấn đề tối ưu hóa và nâng cao hiệu quả trong xử lý thông tin hình ảnh
28 trang 223 0 0 -
Đề tài nghiên cứu khoa học và công nghệ cấp trường: Hệ thống giám sát báo trộm cho xe máy
63 trang 200 0 0 -
NGHIÊN CỨU CHỌN TẠO CÁC GIỐNG LÚA CHẤT LƯỢNG CAO CHO VÙNG ĐỒNG BẰNG SÔNG CỬU LONG
9 trang 200 0 0 -
Đề tài nghiên cứu khoa học: Tội ác và hình phạt của Dostoevsky qua góc nhìn tâm lý học tội phạm
70 trang 190 0 0 -
98 trang 171 0 0
-
96 trang 168 0 0
-
SỨC MẠNH CHÍNH TRỊ CỦA LIÊN MINH CHÂU ÂU TRÊN TRƯỜNG QUỐC TẾ
4 trang 168 0 0 -
26 trang 166 0 0
-
8 trang 164 0 0
-
209 trang 163 0 0
-
48 trang 162 0 0
-
22 trang 158 0 0
-
Báo cáo nghiên cứu khoa học: Tán xạ raman cưỡng bức trong gần đúng ba chiều
6 trang 151 0 0 -
Đề tài: CÔNG BẰNG XÃ HỘI, TRÁCH NHIỆM XÃ HỘI VÀ ĐOÀN KẾT XÃ HỘI TRONG SỰ NGHIỆP ĐỔI MỚI Ở VIỆT NAM
18 trang 148 0 0 -
69 trang 148 0 0
-
7 trang 148 0 0
-
Xây dựng ontology cho hệ thống truy vấn dữ liệu tùy chọn
5 trang 143 0 0