High-Performance Parallel Database Processing and Grid Databases- P3
Số trang: 50
Loại file: pdf
Dung lượng: 407.16 KB
Lượt xem: 5
Lượt tải: 0
Xem trước 5 trang đầu tiên của tài liệu này:
Thông tin tài liệu:
High-Performance Parallel Database Processing and Grid Databases- P3: Parallel databases are database systems that are implemented on parallel computingplatforms. Therefore, high-performance query processing focuses on queryprocessing, including database queries and transactions, that makes use of parallelismtechniques applied to an underlying parallel computing platform in order toachieve high performance.
Nội dung trích xuất từ tài liệu:
High-Performance Parallel Database Processing and Grid Databases- P380 Chapter 4 Parallel Sort and GroupBymay also be used. Apart from these basic functions, most commercial relationaldatabase management systems (RDBMS) also include other advanced functions,such as advanced statistical functions, etc. From a query processing point of view,these functions take a set of records (i.e., a table) as their input and produce a singlevalue as the result.4.1.3 GroupByAn example of a GroupBy query is “retrieve number of students for each degree”.The student records are grouped according to specific degrees, and for each groupthe number of records is counted. These numbers will then represent the numberof students in each degree program. The SQL and a sample result of this query aregiven below.Query 4.5: Select Sdegree, COUNT(*) From STUDENT Group By Sdegree; It is also worth mentioning that the input table may have been filtered by usinga Where clause (in both scalar aggregate and GroupBy queries), and additionallyfor GroupBy queries the results of the grouping may be further filtered by using aHaving clause.4.2 SERIAL EXTERNAL SORTING METHODSerial external sorting is external sorting in a uniprocessor environment. The mostcommon serial external sorting algorithm is based on sort-merge. The underlyingprinciple of sort-merge algorithm is to break the file up into unsorted subfiles, sortthe subfiles, and then merge the sorted subfiles into larger and larger sorted subfilesuntil the entire file is sorted. Note that the first stage involves sorting the first lot ofsubfiles, whereas the second stage is actually the merging phase. In this scenario,it is important to determine the size of the first lot of subfiles that are to be sorted.Normally, each of these subfiles must be small enough to fit into the main memory,so that sorting of these subfiles can be done in the main memory with any internalsorting technique. In other words, the size of these subfiles is usually determinedby the buffer size in main memory, which is to be used for sorting each subfileinternally. A typical algorithm for external sorting using B buffers is presented inFigure 4.1. The algorithm presented in Figure 4.1 is divided into two phases: sort andmerge. The merge phase consists of loops and each run in the outer loop is calleda pass; subsequently, the merge phase contains i passes, where i D 1; 2; : : : . Forconsistency, the sort phase is named pass 0. To explain the sort phase, consider the following example. Assume the size ofthe file to be sorted is 108 pages and we have 5 buffer pages available (B D 5 4.2 Serial External Sorting Method 81Algorithm: Serial External Sorting// Sort phase Pass 01. Read B pages at a time into memory2. Sort them, and Write out a sub-file3. Repeat steps 1-2 until all pages have been processed// Merge phase Pass i D 1, 2, : : :4. While the number of sub-files at end of previous pass is > 15. While there are sub-files to be merged from previous pass6. Choose B-1 sorted sub-files from the previous pass7. Read each sub-file into an input buffer page at a time8. Merge these sub-files into one bigger sub-file9. Write to the output buffer one page at a timeFigure 4.1 External sorting algorithm based on sort-mergepages). First read 5 pages from the file, sort them, and write them as one subfileinto the disk. Then read, sort, and write another 5 pages. In the last run, read, sort,and write 3 pages only. As a result of this sort phase, d108=Be D 22 subfiles, wherethe first 21 subfiles are of size 5 pages each and the last subfile is only 3 pages long. Once the sorting of subfiles is completed, the merge phase starts. Continuing theexample above, we will use B 1 buffers (i.e., 4 buffers) for input and 1 bufferfor output. The merging process is as follows. In pass 1, we first read 4 sortedsubfiles that are produced in the sort phase. Then we perform a 4-way merg-ing (because only 4 buffers are used as input). This 4-way merging is actually ak-way merging, and in this case k D 4, since the number of input buffers is 4 (i.e.,B 1 buffers D 4 buffers). An algorithm for a k-way merging is explained inFigure 4.2. The above 4-way merging is repeated until all subfiles (e.g., 22 subfiles frompass 0) are processed. This process is called pass 1, and it produces d22=4e D 6subfiles of 20 pages each, except for the last run, which is only 8 pages long. The next pass, pass 2, repeats the 4-way merging to merge the 6 subfiles pro-duced in pass 1. We then first read 4 subfiles of 20 pages long and perform a 4-waymerge. This results in a subfile 80 pages long. Then we read the last 2 subfiles, oneof which is 20 pages long while the other is only 8 pages long, and merge them tobecome the second subfile in this pass. So, as a result, pass 2 produces ...
Nội dung trích xuất từ tài liệu:
High-Performance Parallel Database Processing and Grid Databases- P380 Chapter 4 Parallel Sort and GroupBymay also be used. Apart from these basic functions, most commercial relationaldatabase management systems (RDBMS) also include other advanced functions,such as advanced statistical functions, etc. From a query processing point of view,these functions take a set of records (i.e., a table) as their input and produce a singlevalue as the result.4.1.3 GroupByAn example of a GroupBy query is “retrieve number of students for each degree”.The student records are grouped according to specific degrees, and for each groupthe number of records is counted. These numbers will then represent the numberof students in each degree program. The SQL and a sample result of this query aregiven below.Query 4.5: Select Sdegree, COUNT(*) From STUDENT Group By Sdegree; It is also worth mentioning that the input table may have been filtered by usinga Where clause (in both scalar aggregate and GroupBy queries), and additionallyfor GroupBy queries the results of the grouping may be further filtered by using aHaving clause.4.2 SERIAL EXTERNAL SORTING METHODSerial external sorting is external sorting in a uniprocessor environment. The mostcommon serial external sorting algorithm is based on sort-merge. The underlyingprinciple of sort-merge algorithm is to break the file up into unsorted subfiles, sortthe subfiles, and then merge the sorted subfiles into larger and larger sorted subfilesuntil the entire file is sorted. Note that the first stage involves sorting the first lot ofsubfiles, whereas the second stage is actually the merging phase. In this scenario,it is important to determine the size of the first lot of subfiles that are to be sorted.Normally, each of these subfiles must be small enough to fit into the main memory,so that sorting of these subfiles can be done in the main memory with any internalsorting technique. In other words, the size of these subfiles is usually determinedby the buffer size in main memory, which is to be used for sorting each subfileinternally. A typical algorithm for external sorting using B buffers is presented inFigure 4.1. The algorithm presented in Figure 4.1 is divided into two phases: sort andmerge. The merge phase consists of loops and each run in the outer loop is calleda pass; subsequently, the merge phase contains i passes, where i D 1; 2; : : : . Forconsistency, the sort phase is named pass 0. To explain the sort phase, consider the following example. Assume the size ofthe file to be sorted is 108 pages and we have 5 buffer pages available (B D 5 4.2 Serial External Sorting Method 81Algorithm: Serial External Sorting// Sort phase Pass 01. Read B pages at a time into memory2. Sort them, and Write out a sub-file3. Repeat steps 1-2 until all pages have been processed// Merge phase Pass i D 1, 2, : : :4. While the number of sub-files at end of previous pass is > 15. While there are sub-files to be merged from previous pass6. Choose B-1 sorted sub-files from the previous pass7. Read each sub-file into an input buffer page at a time8. Merge these sub-files into one bigger sub-file9. Write to the output buffer one page at a timeFigure 4.1 External sorting algorithm based on sort-mergepages). First read 5 pages from the file, sort them, and write them as one subfileinto the disk. Then read, sort, and write another 5 pages. In the last run, read, sort,and write 3 pages only. As a result of this sort phase, d108=Be D 22 subfiles, wherethe first 21 subfiles are of size 5 pages each and the last subfile is only 3 pages long. Once the sorting of subfiles is completed, the merge phase starts. Continuing theexample above, we will use B 1 buffers (i.e., 4 buffers) for input and 1 bufferfor output. The merging process is as follows. In pass 1, we first read 4 sortedsubfiles that are produced in the sort phase. Then we perform a 4-way merg-ing (because only 4 buffers are used as input). This 4-way merging is actually ak-way merging, and in this case k D 4, since the number of input buffers is 4 (i.e.,B 1 buffers D 4 buffers). An algorithm for a k-way merging is explained inFigure 4.2. The above 4-way merging is repeated until all subfiles (e.g., 22 subfiles frompass 0) are processed. This process is called pass 1, and it produces d22=4e D 6subfiles of 20 pages each, except for the last run, which is only 8 pages long. The next pass, pass 2, repeats the 4-way merging to merge the 6 subfiles pro-duced in pass 1. We then first read 4 subfiles of 20 pages long and perform a 4-waymerge. This results in a subfile 80 pages long. Then we read the last 2 subfiles, oneof which is 20 pages long while the other is only 8 pages long, and merge them tobecome the second subfile in this pass. So, as a result, pass 2 produces ...
Tìm kiếm theo từ khóa liên quan:
bảo mật cơ sở dữ liệu Oracle cơ bản giáo trình cơ sở dữ liệu cơ sở dữ liệu Mysql giáo trình sqlGợi ý tài liệu liên quan:
-
62 trang 402 3 0
-
Giáo trình Cơ sở dữ liệu: Phần 2 - TS. Nguyễn Hoàng Sơn
158 trang 293 0 0 -
Giáo trình Cơ sở dữ liệu: Phần 2 - Đại học Kinh tế TP. HCM
115 trang 176 0 0 -
Giáo trình Cơ sở dữ liệu: Phần 1 - Sở Bưu chính Viễn Thông TP Hà Nội
48 trang 170 1 0 -
Giáo Trình về Cơ Sở Dữ Liệu - Phan Tấn Quốc
114 trang 118 1 0 -
Giáo trình cơ sở dữ liệu quan hệ_3
26 trang 106 0 0 -
Giáo trình Cơ sở dữ liệu (Ngành: Công nghệ thông tin - Trung cấp) - Trường Cao đẳng Xây dựng số 1
49 trang 100 0 0 -
54 trang 69 0 0
-
134 trang 62 1 0
-
0 trang 56 0 0