A Concise Introduction to Data Compression- P7
Số trang: 48
Loại file: pdf
Dung lượng: 447.24 KB
Lượt xem: 10
Lượt tải: 0
Xem trước 5 trang đầu tiên của tài liệu này:
Thông tin tài liệu:
Tham khảo tài liệu a concise introduction to data compression- p7, công nghệ thông tin, cơ sở dữ liệu phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả
Nội dung trích xuất từ tài liệu:
A Concise Introduction to Data Compression- P7 Chapter Summary 263 The first enhancement improves compression in small alphabets. In Unicode, mostsmall alphabets start on a 128-byte boundary, although the alphabet size may be morethan 128 symbols. This suggests that a difference be computed not between the currentand previous code values but between the current code value and the value in themiddle of the 128-byte segment where the previous code value is located. Specifically,the difference is computed by subtracting a base value from the current code point. Thebase value is obtained from the previous code point as follows. If the previous code valueis in the interval xxxx00 to xxxx7F (i.e., its seven LSBs are 0 to 127), the base valueis set to xxxx40 (the seven LSBs are 64), and if the previous code point is in the rangexxxx80 to xxxxFF (i.e., its seven least-significant bits are 128 to 255), the base value isset to xxxxC0 (the seven LSBs are 192). This way, if the current code point is within128 positions of the base value, the difference is in the range [−128, +127] which makesit fit in one byte. The second enhancement has to do with remote symbols. A document in a non-Latin alphabet (where the code points are very different from the ASCII codes) may usespaces between words. The code point for a space is the ASCII code 2016 , so any pair ofcode points that includes a space results in a large difference. BOCU therefore computesa difference by first computing the base values of the three previous code points, andthen subtracting the smallest base value from the current code point. BOCU-1 is the version of BOCU that’s commonly used in practice [BOCU-1 02]. Itdiffers from the original BOCU method by using a different set of byte value ranges andby encoding the ASCII control characters U+0000 through U+0020 with byte values 0through 2016 , respectively. These features make BOCU-1 suitable for compressing inputfiles that are MIME (text) media types. Il faut avoir beaucoup ´tudi´ pour savoir peu (it is necessary to study much in order e e to know little). —Montesquieu (Charles de Secondat), Pens´es diverses eChapter SummaryThis chapter is devoted to data compression methods and techniques that are not basedon the approaches discussed elsewhere in this book. The following algorithms illustratesome of these original techniques: The Burrows–Wheeler method (Section 7.1) starts with a string S of n symbols andscrambles (i.e., permutes) them into another string L that satisfies two conditions: (1)Any area of L will tend to have a concentration of just a few symbols. (2) It is possibleto reconstruct the original string S from L. Since its inception in the early 1990s, thisunexpected method has been the subject of much research. The technique of symbol ranking (Section 7.2) uses context, rather than probabili-ties, to rank symbols. Sections 7.3 and 7.3.1 describe two algorithms, SCSU and BOCU-1, for the com-pression of Unicode-based documents.264 7. Other Methods Chapter 8 of [Salomon 07] discusses other methods, techniques, and approaches todata compression. Self-Assessment Questions 1. The term “fractals” appears early in this chapter. One of the applications offractals is to compress images, and it is the purpose of this note to encourage the readerto search for material on fractal compression and study it. 2. The Burrows–Wheeler method has been the subject of much research and at-tempts to speed up its decoding and improve it. Using the paper at [JuergenAbel 07]as your starting point, try to gain a deeper understanding of this interesting method. 3. The term “lexicographic order” appears in Section 7.1. This is an importantterm in computer science in general, and the conscientious reader should make sure thisterm is fully understood. 4. Most Unicodes are 16 bits long, but this standard has provisions for longer codes.Use [Unicode 07] as a starting point to learn more about Unicode and how codes longerthan 16 bits are structured. In comedy, as a matter of fact, a greater variety of methods were discovered and employed than in tragedy. —T. S. Eliot, The Sacred Wood (1920)BibliographyAhmed, N., T. Natarajan, and R. K. Rao (1974) “Discrete Cosine Transform,” IEEETransactions on Computers, C-23:90–93.Bell, Timothy C., John G. Cleary, and Ian H. Witten (1990) Text Compression, Engle-wood Cliffs, Prentice Hall.BOCU (2001) is http://oss.software.ibm.com/icu/docs/papers/binary_ordered_compression_for_unicode.html.BOCU-1 (2002) is h ...
Nội dung trích xuất từ tài liệu:
A Concise Introduction to Data Compression- P7 Chapter Summary 263 The first enhancement improves compression in small alphabets. In Unicode, mostsmall alphabets start on a 128-byte boundary, although the alphabet size may be morethan 128 symbols. This suggests that a difference be computed not between the currentand previous code values but between the current code value and the value in themiddle of the 128-byte segment where the previous code value is located. Specifically,the difference is computed by subtracting a base value from the current code point. Thebase value is obtained from the previous code point as follows. If the previous code valueis in the interval xxxx00 to xxxx7F (i.e., its seven LSBs are 0 to 127), the base valueis set to xxxx40 (the seven LSBs are 64), and if the previous code point is in the rangexxxx80 to xxxxFF (i.e., its seven least-significant bits are 128 to 255), the base value isset to xxxxC0 (the seven LSBs are 192). This way, if the current code point is within128 positions of the base value, the difference is in the range [−128, +127] which makesit fit in one byte. The second enhancement has to do with remote symbols. A document in a non-Latin alphabet (where the code points are very different from the ASCII codes) may usespaces between words. The code point for a space is the ASCII code 2016 , so any pair ofcode points that includes a space results in a large difference. BOCU therefore computesa difference by first computing the base values of the three previous code points, andthen subtracting the smallest base value from the current code point. BOCU-1 is the version of BOCU that’s commonly used in practice [BOCU-1 02]. Itdiffers from the original BOCU method by using a different set of byte value ranges andby encoding the ASCII control characters U+0000 through U+0020 with byte values 0through 2016 , respectively. These features make BOCU-1 suitable for compressing inputfiles that are MIME (text) media types. Il faut avoir beaucoup ´tudi´ pour savoir peu (it is necessary to study much in order e e to know little). —Montesquieu (Charles de Secondat), Pens´es diverses eChapter SummaryThis chapter is devoted to data compression methods and techniques that are not basedon the approaches discussed elsewhere in this book. The following algorithms illustratesome of these original techniques: The Burrows–Wheeler method (Section 7.1) starts with a string S of n symbols andscrambles (i.e., permutes) them into another string L that satisfies two conditions: (1)Any area of L will tend to have a concentration of just a few symbols. (2) It is possibleto reconstruct the original string S from L. Since its inception in the early 1990s, thisunexpected method has been the subject of much research. The technique of symbol ranking (Section 7.2) uses context, rather than probabili-ties, to rank symbols. Sections 7.3 and 7.3.1 describe two algorithms, SCSU and BOCU-1, for the com-pression of Unicode-based documents.264 7. Other Methods Chapter 8 of [Salomon 07] discusses other methods, techniques, and approaches todata compression. Self-Assessment Questions 1. The term “fractals” appears early in this chapter. One of the applications offractals is to compress images, and it is the purpose of this note to encourage the readerto search for material on fractal compression and study it. 2. The Burrows–Wheeler method has been the subject of much research and at-tempts to speed up its decoding and improve it. Using the paper at [JuergenAbel 07]as your starting point, try to gain a deeper understanding of this interesting method. 3. The term “lexicographic order” appears in Section 7.1. This is an importantterm in computer science in general, and the conscientious reader should make sure thisterm is fully understood. 4. Most Unicodes are 16 bits long, but this standard has provisions for longer codes.Use [Unicode 07] as a starting point to learn more about Unicode and how codes longerthan 16 bits are structured. In comedy, as a matter of fact, a greater variety of methods were discovered and employed than in tragedy. —T. S. Eliot, The Sacred Wood (1920)BibliographyAhmed, N., T. Natarajan, and R. K. Rao (1974) “Discrete Cosine Transform,” IEEETransactions on Computers, C-23:90–93.Bell, Timothy C., John G. Cleary, and Ian H. Witten (1990) Text Compression, Engle-wood Cliffs, Prentice Hall.BOCU (2001) is http://oss.software.ibm.com/icu/docs/papers/binary_ordered_compression_for_unicode.html.BOCU-1 (2002) is h ...
Tìm kiếm theo từ khóa liên quan:
thủ thuật máy tính công nghệ thông tin tin học quản trị mạng computer networkGợi ý tài liệu liên quan:
-
52 trang 430 1 0
-
24 trang 354 1 0
-
Top 10 mẹo 'đơn giản nhưng hữu ích' trong nhiếp ảnh
11 trang 313 0 0 -
Làm việc với Read Only Domain Controllers
20 trang 302 0 0 -
74 trang 296 0 0
-
96 trang 292 0 0
-
Báo cáo thực tập thực tế: Nghiên cứu và xây dựng website bằng Wordpress
24 trang 289 0 0 -
Đồ án tốt nghiệp: Xây dựng ứng dụng di động android quản lý khách hàng cắt tóc
81 trang 280 0 0 -
EBay - Internet và câu chuyện thần kỳ: Phần 1
143 trang 275 0 0 -
Tài liệu dạy học môn Tin học trong chương trình đào tạo trình độ cao đẳng
348 trang 269 1 0