Skip to content
2000
Volume 15, Issue 4
  • ISSN: 2352-0965
  • E-ISSN: 2352-0973

Abstract

Background: With the rapid development of science, more data is available to human beings. Therefore, the storage and calculation of big data have become the focus of scientific research. MapReduce performs well in the big data processing. However, it is prone to data skew, which affects the overall efficiency of the data processing cluster. Objective: Aiming at the low efficiency of MapReduce data join, this paper proposes an intelligent data join load balancing algorithm based on dynamic programming. The algorithm introduces data sampling and partition algorithms. Due to the high performance of dynamic programming in the data constraint problem, it is used to solve the data skew problem intelligently. Methods: Firstly, the causes of data skew are analyzed and the data partition method is improved. The algorithm introduces a data sampling method. In the task allocation stage, the multidimensional knapsack algorithm is used. Different key values are evenly divided to each computing node through the load cost. Finally, The performance of the improved algorithm is verified by experiments. Results: The experimental results show that compared with the traditional load balancing algorithm and the existing improved algorithm, the new algorithm improves the data processing efficiency, reduces the data skew problem and better solves the problem of data load imbalance. Conclusion: A two-table equivalent join load balancing algorithm based on key cost has been proposed. The algorithm creatively combines dynamic programming with intelligent data sampling, which greatly improves the efficiency and quality of data processing. The algorithm is worthy of popularization and application.

Loading

Article metrics loading...

/content/journals/raeeng/10.2174/2352096515666220603164248
2022-06-01
2025-06-22
Loading full text...

Full text loading...

/content/journals/raeeng/10.2174/2352096515666220603164248
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test