Data Mining P2
Thông tin tài liệu:
Nội dung trích xuất từ tài liệu:
Data Mining P2 DATA COMPRESSION 11 1. In English text files, common words (e.g., is, are, the) or simi- lar patterns of character strings (e.g., lze\ lth\ iing1} are usually used repeatedly. It is also observed that the characters in an English text occur in a well-documented distribution, with letter e and space being the most popular. 2. In numeric data files, often we observe runs of similar numbers or pre- dictable interdependency amongst the numbers. 3. The neighboring pixels in a typical image are highly correlated to each other, with the pixels in a smooth region of an image having similar values. 4. Two consecutive frames in a video are often mostly identical when mo- tion in the scene is slow. 5. Some audio data beyond the human audible frequency range are useless for all practical purposes. Data compression is the technique to reduce the redundancies in data repre- sentation in order to decrease data storage requirements and, hence, commu- nication costs when transmitted through a communication network [24, 25]. Reducing the storage requirement is equivalent to increasing the capacity of the storage medium. If the compressed data are properly indexed, it may improve the performance of mining data in the compressed large database as well. This is particularly useful when interactivity is involved with a data mining system. Thus the development of efficient compression techniques,particularly suitable for data mining, will continue to be a design challenge for advanced database management systems and interactive multimedia ap-plications. Depending upon the application criteria, data compression techniques canbe classified as lossless and lossy. In lossless methods we compress the data insuch a way that the decompressed data can be an exact replica of the originaldata. Lossless compression techniques are applied to compress text, numeric,or character strings in a database - typically, medical data, etc. On the otherhand, there are application areas where we can compromise with the accuracyof the decompressed data and can, therefore, afford to lose some information.For example, typical image, video, and audio compression techniques are lossy,since the approximation of the original data during reconstruction is goodenough for human perception. In our view, data compression is a field that has so far been neglectedby the data mining community. The basic principle of data compressionis to reduce the redundancies in data representation, in order to generatea shorter representation for the data to conserve data storage. In earlierdiscussions, we emphasized that data reduction is an important preprocessingtask in data mining. Need for reduced representation of data is crucial forthe success of very large multimedia database applications and the associated12 INTRODUCTION TO DATA MININGeconomical usage of data storage. Multimedia databases are typically muchlarger than, say, business or financial data, simply because an attribute itselfin a multimedia database could be a high-resolution digital image. Hencestorage and subsequent access of thousands of high-resolution images, whichare possibly interspersed with other datatypes as attributes, is a challenge.Data compression offers advantages in the storage management of such hugedata. Although data compression has been recognized as a potential areafor data reduction in literature [13], not much work has been reported so faron how the data compression techniques can be integrated in a data miningsystem. Data compression can also play an important role in data condensation.An approach for dealing with the intractable problem of learning from hugedatabases is to select a small subset of data as representatives for learning.Large data can be viewed at varying degrees of detail in different regions ofthe feature space, thereby providing adequate importance depending on theunderlying probability density [26]. However, these condensation techniquesare useful only when the structure of data is well-organized. Multimediadata, being not so well-structured in its raw form, leads to a big bottleneckin the application of existing data mining principles. In order to avoid thisproblem, one approach could be to store some predetermined feature set ofthe multimedia data as an index at the header of the compressed file, andsubsequently use this condensed information for the discovery of informationor data mining. We believe that integration of data compression principles and techniquesin data mining systems will yield promising results, particularly in the age ofmultimedia information and their growing usage in the Internet. Soon therewill arise the need to automatically discover or access information from suchmultimedia data domains, in place of well-organized business and financialdata only. Keeping this goal in mind, we intended to devote significant dis-cussions on data compression techniques and their principles in multimediadata domain involving text, numeric and non-numeric data, images, etc. We have elaborated on the fundamentals of data compression and imagecompression principles and some popular algorithms in Chapter 3. Thenwe have described, in Chapter 9, how some data compression principles canimprove the efficiency of information retrieval particularly suitable for multi-media data mining.1.4 INFORMATION RETRIEVALUsers approach large information spaces like the Web with different motives,namely, to (i) search for a specific piece of information or topic, (ii) gainfamiliarity with, or an overview of, some general topic or domain, and (iii)locate something that might be of interest, without a clear prior notion ofwhat interesting should look like. The field of information retrieval d ...
Tìm kiếm theo từ khóa liên quan:
Cơ sở dữ liệu Quản trị web Hệ điều hành Công nghệ thông tin Tin họcTài liệu cùng danh mục:
-
62 trang 388 3 0
-
Đề thi kết thúc học phần học kì 2 môn Cơ sở dữ liệu năm 2019-2020 có đáp án - Trường ĐH Đồng Tháp
5 trang 371 6 0 -
Bài giảng Phân tích thiết kế hệ thống thông tin: Chương 3 - Hệ điều hành Windowns XP
39 trang 318 0 0 -
Phương pháp truyền dữ liệu giữa hai điện thoại thông minh qua môi trường ánh sáng nhìn thấy
6 trang 307 0 0 -
Đề cương chi tiết học phần Cấu trúc dữ liệu và giải thuật (Data structures and algorithms)
10 trang 299 0 0 -
Đáp án đề thi học kỳ 2 môn cơ sở dữ liệu
3 trang 288 1 0 -
Giáo trình Cơ sở dữ liệu: Phần 2 - TS. Nguyễn Hoàng Sơn
158 trang 279 0 0 -
PHÂN TÍCH THIẾT KẾ HỆ THỐNG XÂY DỰNG HỆ THỐNG ĐẶT VÉ TÀU ONLINE
43 trang 276 2 0 -
Phân tích thiết kế hệ thống - Biểu đồ trạng thái
20 trang 265 0 0 -
Một số vấn đề về chuyển đổi số và ứng dụng trong doanh nghiệp
11 trang 247 0 0
Tài liệu mới:
-
Đề thi học kì 1 môn KHTN lớp 6 năm 2024-2025 có đáp án - Trường THCS Nguyễn Trãi, Núi Thành
14 trang 0 0 0 -
52 trang 0 0 0
-
7 trang 0 0 0
-
11 trang 0 0 0
-
54 trang 0 0 0
-
Đề thi học kì 2 môn GDCD lớp 6 năm 2023-2024 - Trường TH&THCS Đại Sơn, Đại Lộc
2 trang 0 0 0 -
7 trang 0 0 0
-
Đánh giá kết quả điều trị đục thể thủy tinh nhân cứng bằng phẫu thuật phaco
5 trang 0 0 0 -
Nghiên cứu đặc điểm lâm sàng và kết quả điều trị glôcôm thứ phát do đục thể thủy tinh căng phồng
5 trang 2 0 0 -
8 trang 0 0 0