![Phân tích tư tưởng của nhân dân qua đoạn thơ: Những người vợ nhớ chồng… Những cuộc đời đã hóa sông núi ta trong Đất nước của Nguyễn Khoa Điềm](https://timtailieu.net/upload/document/136415/phan-tich-tu-tuong-cua-nhan-dan-qua-doan-tho-039-039-nhung-nguoi-vo-nho-chong-nhung-cuoc-doi-da-hoa-song-nui-ta-039-039-trong-dat-nuoc-cua-nguyen-khoa-136415.jpg)
High-Performance Parallel Database Processing and Grid Databases- P4
Số trang: 50
Loại file: pdf
Dung lượng: 391.11 KB
Lượt xem: 5
Lượt tải: 0
Xem trước 5 trang đầu tiên của tài liệu này:
Thông tin tài liệu:
High-Performance Parallel Database Processing and Grid Databases- P4: Parallel databases are database systems that are implemented on parallel computingplatforms. Therefore, high-performance query processing focuses on queryprocessing, including database queries and transactions, that makes use of parallelismtechniques applied to an underlying parallel computing platform in order toachieve high performance.
Nội dung trích xuất từ tài liệu:
High-Performance Parallel Database Processing and Grid Databases- P4130 Chapter 5 Parallel Join Ri and Si in the cost equation indicate the fragment size of both tables in eachprocessor. ž Receiving records cost is: ..Ri =P/ C .Si =P// ð .m p / Both data transfer and receiving costs look similar, as also mentioned abovefor the divide and broadcast cost. However, for disjoint partitioning the size of Riand Si in the data transfer cost is likely to be different from that of the receivingcost. The reason is as follows. Following the example in Figures 5.14 and 5.16,Ri and Si in the data transfer cost are the size of each fragment of both tablesin each processor. Again, assuming that the initial data placement is done witha round-robin or any other equal partitioning, each fragment size will be equal.Therefore, Ri and Si in the data transfer cost are simply dividing the total tablesize by the available number of processors. However, Ri and Si in the receiving cost are most likely skewed (as alreadymentioned in Chapter 2 on analytical models). As shown in Figures 5.14 and 5.16,the spread of the fragments after the distribution is not even. Therefore, the skewmodel must be taken into account, and consequently the values of Ri and Si in thereceiving cost are different from those of the data transfer cost. Finally, the last phase is data storing, which involves storing all records receivedby each processor. ž Disk cost for storing the result of data distribution is: ..Ri =P/ C .Si =P// ð IO5.4.3 Cost Models for Local JoinFor the local join, since a hash-based join is the most efficient join algorithm, itis assumed that a hash-based join is used in the local join. The cost of the localjoin with a hash-based join comprises three main phases: data loading from eachprocessor, the joining process (hashing and probing), and result storing in eachprocessor. The data loading consists of scan costs and select costs. These are identical tothose of the disjoint partitioning costs, which are: ž Scan cost D ..Ri =P/ C .Si =P// ð IO ž Select cost D .jRi j C jSi j/ ð .tr C tw / It has been emphasized that (jRi j C jSi j) as well as (.Ri =P/ C .Si =P/) corre-spond to the values in the receiving and disk costs of the disjoint partitioning. The join process itself is basically incurring hashing and probing costs, whichare as follows: 5.4 Cost Models 131 ž Join costs involve reading, hashing, and probing: .jRi j ð .tr C th / C .jSi j ð .tr C th C t j // The process is basically reading each record R and hashing it to a hash table.After all records R have been processed, records S can be read, hashed, and probed.If they are matched, the matching records are written out to the query result. The hashing process is very much determined by the size of the hash table thatcan fit into main memory. If the memory size is smaller than the hash table size,we normally partition the hash table into multiple buckets whereby each bucketcan perfectly fit into main memory. All but the first bucket are spooled to disk. Based on this scenario, we must include the I/O cost for reading and writingoverflow buckets, which is as follows. ž Reading/writing of overflow buckets cost is the I/O cost associated with the limited ability of main memory to accommodate the entire hash table. This cost includes the costs for reading and writing records not processed in the first phase of hashing. Â Â ÃÃ Â Ã H Si 1 min ;1 ð ð 2 ð IO jSi j P Although this looks similar to that mentioned in other chapters regarding theoverhead of overflow buckets, there are two significant differences. One is thatonly Si is included in the cost component, because only the table S is hashed; andthe second difference is that the projection and selection variables are not included,because all records S are hashed. The final cost is the query results storing cost, consisting of generating resultcost and disk cost. ž Generating result records cost is the number of selected records multiplied by the writing unit cost. jRi j ð σj ð jSi j ð tw Note that the cost is reduced by the join selectivity factor σj , where the smallerthe selectivity factor, the lower the number of records produced by the join opera-tion. ž Disk cost for storing the final result is the number of pages needed to store the final aggregate values times the disk unit cost, which is: .πR ð Ri ð σj ð πS ð Si =P/ ð ...
Nội dung trích xuất từ tài liệu:
High-Performance Parallel Database Processing and Grid Databases- P4130 Chapter 5 Parallel Join Ri and Si in the cost equation indicate the fragment size of both tables in eachprocessor. ž Receiving records cost is: ..Ri =P/ C .Si =P// ð .m p / Both data transfer and receiving costs look similar, as also mentioned abovefor the divide and broadcast cost. However, for disjoint partitioning the size of Riand Si in the data transfer cost is likely to be different from that of the receivingcost. The reason is as follows. Following the example in Figures 5.14 and 5.16,Ri and Si in the data transfer cost are the size of each fragment of both tablesin each processor. Again, assuming that the initial data placement is done witha round-robin or any other equal partitioning, each fragment size will be equal.Therefore, Ri and Si in the data transfer cost are simply dividing the total tablesize by the available number of processors. However, Ri and Si in the receiving cost are most likely skewed (as alreadymentioned in Chapter 2 on analytical models). As shown in Figures 5.14 and 5.16,the spread of the fragments after the distribution is not even. Therefore, the skewmodel must be taken into account, and consequently the values of Ri and Si in thereceiving cost are different from those of the data transfer cost. Finally, the last phase is data storing, which involves storing all records receivedby each processor. ž Disk cost for storing the result of data distribution is: ..Ri =P/ C .Si =P// ð IO5.4.3 Cost Models for Local JoinFor the local join, since a hash-based join is the most efficient join algorithm, itis assumed that a hash-based join is used in the local join. The cost of the localjoin with a hash-based join comprises three main phases: data loading from eachprocessor, the joining process (hashing and probing), and result storing in eachprocessor. The data loading consists of scan costs and select costs. These are identical tothose of the disjoint partitioning costs, which are: ž Scan cost D ..Ri =P/ C .Si =P// ð IO ž Select cost D .jRi j C jSi j/ ð .tr C tw / It has been emphasized that (jRi j C jSi j) as well as (.Ri =P/ C .Si =P/) corre-spond to the values in the receiving and disk costs of the disjoint partitioning. The join process itself is basically incurring hashing and probing costs, whichare as follows: 5.4 Cost Models 131 ž Join costs involve reading, hashing, and probing: .jRi j ð .tr C th / C .jSi j ð .tr C th C t j // The process is basically reading each record R and hashing it to a hash table.After all records R have been processed, records S can be read, hashed, and probed.If they are matched, the matching records are written out to the query result. The hashing process is very much determined by the size of the hash table thatcan fit into main memory. If the memory size is smaller than the hash table size,we normally partition the hash table into multiple buckets whereby each bucketcan perfectly fit into main memory. All but the first bucket are spooled to disk. Based on this scenario, we must include the I/O cost for reading and writingoverflow buckets, which is as follows. ž Reading/writing of overflow buckets cost is the I/O cost associated with the limited ability of main memory to accommodate the entire hash table. This cost includes the costs for reading and writing records not processed in the first phase of hashing. Â Â ÃÃ Â Ã H Si 1 min ;1 ð ð 2 ð IO jSi j P Although this looks similar to that mentioned in other chapters regarding theoverhead of overflow buckets, there are two significant differences. One is thatonly Si is included in the cost component, because only the table S is hashed; andthe second difference is that the projection and selection variables are not included,because all records S are hashed. The final cost is the query results storing cost, consisting of generating resultcost and disk cost. ž Generating result records cost is the number of selected records multiplied by the writing unit cost. jRi j ð σj ð jSi j ð tw Note that the cost is reduced by the join selectivity factor σj , where the smallerthe selectivity factor, the lower the number of records produced by the join opera-tion. ž Disk cost for storing the final result is the number of pages needed to store the final aggregate values times the disk unit cost, which is: .πR ð Ri ð σj ð πS ð Si =P/ ð ...
Tìm kiếm theo từ khóa liên quan:
bảo mật cơ sở dữ liệu Oracle cơ bản giáo trình cơ sở dữ liệu cơ sở dữ liệu Mysql giáo trình sqlTài liệu liên quan:
-
62 trang 408 3 0
-
Giáo trình Cơ sở dữ liệu: Phần 2 - TS. Nguyễn Hoàng Sơn
158 trang 306 0 0 -
Giáo trình Cơ sở dữ liệu: Phần 2 - Đại học Kinh tế TP. HCM
115 trang 186 0 0 -
Giáo trình Cơ sở dữ liệu: Phần 1 - Sở Bưu chính Viễn Thông TP Hà Nội
48 trang 181 1 0 -
Giáo Trình về Cơ Sở Dữ Liệu - Phan Tấn Quốc
114 trang 122 1 0 -
Giáo trình cơ sở dữ liệu quan hệ_3
26 trang 107 0 0 -
Giáo trình Cơ sở dữ liệu (Ngành: Công nghệ thông tin - Trung cấp) - Trường Cao đẳng Xây dựng số 1
49 trang 105 0 0 -
54 trang 72 0 0
-
134 trang 63 1 0
-
0 trang 58 0 0