Full text loading...
-
Research on MapReduce Heuristic Multi Table Join Algorithm Based on Binary Optimization and Pancake Parallel Strategy
- Source: Recent Patents on Engineering, Volume 17, Issue 6, Nov 2023, p. 49 - 62
-
- 01 Nov 2023
Abstract
Background: With the development of technology, the data amount has increased significantly. In data processing, the multi table query is the most frequent operation. Because the join keys cannot correspond one by one, there will be much redundant data transmission, resulting in a waste of network bandwidth. Objective: In order to solve the problems of network overhead and low efficiency, this paper proposes a heuristic multi table join optimization method. By sharing information, the unconnected tuples are eliminated so as to reduce the amount of data transmitting. This shortens response time and improves execution performance. Methods: Firstly, the join key information of one table is compressed by the algorithm to make the filtered information for sharing. Then, the concurrent execution is controlled according to the pancake parallel strategy. Finally, the selection strategy of multi table join order is proposed. Results/Discussion: The experiments show that the proposed algorithm can filter a large amount of useless data and improve query efficiency. At the same time, the proposed algorithm reduces a lot of network overhead, improves the algorithm performance, and better solves the problem of low efficiency of multi table join. Conclusion: This paper introduces the heuristic strategy to optimize the algorithm, so that it can perform the join tasks in parallel, which further improves the performance of multi table join. The algorithm creatively combines heuristic data filtering, which greatly improves the quality of data processing. The algorithm is worth popularizing and applying.