High-Performance Parallel Database Processing and Grid Databases- P3

Số trang: 50 Loại file: pdf Dung lượng: 407.16 KB Lượt xem: 5 Lượt tải: 0

tailieu_vip

Phí tải xuống: 15,000 VND

Xem trước 5 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

High-Performance Parallel Database Processing and Grid Databases- P3: Parallel databases are database systems that are implemented on parallel computingplatforms. Therefore, high-performance query processing focuses on queryprocessing, including database queries and transactions, that makes use of parallelismtechniques applied to an underlying parallel computing platform in order toachieve high performance.
Nội dung trích xuất từ tài liệu:
High-Performance Parallel Database Processing and Grid Databases- P380 Chapter 4 Parallel Sort and GroupBymay also be used. Apart from these basic functions, most commercial relationaldatabase management systems (RDBMS) also include other advanced functions,such as advanced statistical functions, etc. From a query processing point of view,these functions take a set of records (i.e., a table) as their input and produce a singlevalue as the result.4.1.3 GroupByAn example of a GroupBy query is “retrieve number of students for each degree”.The student records are grouped according to speciﬁc degrees, and for each groupthe number of records is counted. These numbers will then represent the numberof students in each degree program. The SQL and a sample result of this query aregiven below.Query 4.5: Select Sdegree, COUNT(*) From STUDENT Group By Sdegree; It is also worth mentioning that the input table may have been ﬁltered by usinga Where clause (in both scalar aggregate and GroupBy queries), and additionallyfor GroupBy queries the results of the grouping may be further ﬁltered by using aHaving clause.4.2 SERIAL EXTERNAL SORTING METHODSerial external sorting is external sorting in a uniprocessor environment. The mostcommon serial external sorting algorithm is based on sort-merge. The underlyingprinciple of sort-merge algorithm is to break the ﬁle up into unsorted subﬁles, sortthe subﬁles, and then merge the sorted subﬁles into larger and larger sorted subﬁlesuntil the entire ﬁle is sorted. Note that the ﬁrst stage involves sorting the ﬁrst lot ofsubﬁles, whereas the second stage is actually the merging phase. In this scenario,it is important to determine the size of the ﬁrst lot of subﬁles that are to be sorted.Normally, each of these subﬁles must be small enough to ﬁt into the main memory,so that sorting of these subﬁles can be done in the main memory with any internalsorting technique. In other words, the size of these subﬁles is usually determinedby the buffer size in main memory, which is to be used for sorting each subﬁleinternally. A typical algorithm for external sorting using B buffers is presented inFigure 4.1. The algorithm presented in Figure 4.1 is divided into two phases: sort andmerge. The merge phase consists of loops and each run in the outer loop is calleda pass; subsequently, the merge phase contains i passes, where i D 1; 2; : : : . Forconsistency, the sort phase is named pass 0. To explain the sort phase, consider the following example. Assume the size ofthe ﬁle to be sorted is 108 pages and we have 5 buffer pages available (B D 5 4.2 Serial External Sorting Method 81Algorithm: Serial External Sorting// Sort phase Pass 01. Read B pages at a time into memory2. Sort them, and Write out a sub-ﬁle3. Repeat steps 1-2 until all pages have been processed// Merge phase Pass i D 1, 2, : : :4. While the number of sub-ﬁles at end of previous pass is > 15. While there are sub-ﬁles to be merged from previous pass6. Choose B-1 sorted sub-ﬁles from the previous pass7. Read each sub-ﬁle into an input buffer page at a time8. Merge these sub-ﬁles into one bigger sub-ﬁle9. Write to the output buffer one page at a timeFigure 4.1 External sorting algorithm based on sort-mergepages). First read 5 pages from the ﬁle, sort them, and write them as one subﬁleinto the disk. Then read, sort, and write another 5 pages. In the last run, read, sort,and write 3 pages only. As a result of this sort phase, d108=Be D 22 subﬁles, wherethe ﬁrst 21 subﬁles are of size 5 pages each and the last subﬁle is only 3 pages long. Once the sorting of subﬁles is completed, the merge phase starts. Continuing theexample above, we will use B 1 buffers (i.e., 4 buffers) for input and 1 bufferfor output. The merging process is as follows. In pass 1, we ﬁrst read 4 sortedsubﬁles that are produced in the sort phase. Then we perform a 4-way merg-ing (because only 4 buffers are used as input). This 4-way merging is actually ak-way merging, and in this case k D 4, since the number of input buffers is 4 (i.e.,B 1 buffers D 4 buffers). An algorithm for a k-way merging is explained inFigure 4.2. The above 4-way merging is repeated until all subﬁles (e.g., 22 subﬁles frompass 0) are processed. This process is called pass 1, and it produces d22=4e D 6subﬁles of 20 pages each, except for the last run, which is only 8 pages long. The next pass, pass 2, repeats the 4-way merging to merge the 6 subﬁles pro-duced in pass 1. We then ﬁrst read 4 subﬁles of 20 pages long and perform a 4-waymerge. This results in a subﬁle 80 pages long. Then we read the last 2 subﬁles, oneof which is 20 pages long while the other is only 8 pages long, and merge them tobecome the second subﬁle in this pass. So, as a result, pass 2 produces ...