Skip to content
2000
Volume 19, Issue 1
  • ISSN: 1872-2121
  • E-ISSN: 2212-4047

Abstract

Background

Cloud services have become a popular approach for offering efficient services for a wide range of activities. Predicting hardware failures in a cloud data center can minimize downtime and make the system more reliable and fault-tolerant.

Objective

This research aims to analyze a predictive hardware failure model based on machine learning that anticipates the required remediations for undiagnosed failures in a cloud computing system serving multiclass requests.

Methods

The model is tested on a carefully designed cloud data center that categorizes incoming requests as web, compute, storage, and dedicated server requests. To demonstrate improved reliability, a carefully designed test case is run on ReliaCloud-NS, which is a simulator for creating a CCS and computing its reliability.

Results

The work found that using this model considerably enhanced the reliability of cloud computing systems when compared to not using the model.

Conclusion

Although various estimation methods are patented to evaluate the system reliability of a cloud computing network, the emphasis of this study was mostly on improving the reliability of request-segregated clouds upon failing hardware resources like CPU, memory, bandwidth, and hard disc. Moreover, the prediction model might potentially be expanded to other system resources such as GPUs, software, and database packages.

Loading

Article metrics loading...

/content/journals/eng/10.2174/0118722121260903231009104614
2025-01-01
2024-11-26
Loading full text...

Full text loading...

References

  1. KimY. KwonO. LeeK. Machine learning framework for accurate diagnosis and remediation of hardware failures in cloud computing systems.IEEE Trans. Cloud Comput.20186246147210.1109/TCC.2016.2628139
    [Google Scholar]
  2. LeeS. LeeS. KimJ. Reliability evaluation of request segregated cloud computing systems using ReliaCloud-NS.Future Gener. Comput. Syst.202011186287410.1016/j.future.2020.06.004
    [Google Scholar]
  3. WangJ. ZhouZ. An efficient hardware failure prediction method based on machine learning in cloud computing.IEEE Access2019710615910617010.1109/ACCESS.2019.2938519
    [Google Scholar]
  4. ZhouZ. WangJ. LuoX. Rule-based hardware failure diagnosis and remediation for cloud computing systems.Future Gener. Comput. Syst.202011174075010.1016/j.future.2020.05.003
    [Google Scholar]
  5. ZhouZ. WangJ. XieY. A hardware replacement strategy for improving reliability of cloud computing systems.Future Gener. Comput. Syst.20188628329210.1016/j.future.2018.03.027
    [Google Scholar]
  6. LiJ. ZhangK. WeiX. HuangT. XieX. A Bayesian network-based reliability model for cloud computing systems.Future Gener. Comput. Syst.202111829831010.1016/j.future.2020.10.003
    [Google Scholar]
  7. YuL. KimJ. ZouY. ReliaBolt: Enhancing cloud reliability via preemptive hardware replacement.IEEE Trans. Comput.2021705696708
    [Google Scholar]
  8. LiJ. LiB. ZhangQ. A reliability-aware resource allocation strategy for cloud computing systems based on Bayesian network.IEEE Access20219872928730310.1109/ACCESS.2021.3089926
    [Google Scholar]
  9. SharmaR. SinghR. Reliability based micro-economic cost model for cloud computing systems2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)Greater Noida, India 19-20 Feb,202129329710.1109/ICCCIS51004.2021.9397210
    [Google Scholar]
  10. NiuZ. ZhangX. A reliability model for cloud computing systems considering the effects of data center environment.Future Gener. Comput. Syst.2020108697810.1016/j.future.2020.02.040
    [Google Scholar]
  11. ZhangX. YangL. XuS. Deep learning-based reliability modeling for cloud computing systems.IEEE Trans. Cloud Comput.2020841071108410.1109/TCC.2019.2913129
    [Google Scholar]
  12. JiangJ. HuJ. XuS. Markov decision process-based reliability modeling for cloud computing systems.IEEE Trans. Cloud Comput.2020851185119810.1109/TCC.2019.2918926
    [Google Scholar]
  13. KimH. AbdelzaherT. KimD. Predicting cloud system reliability with machine learning.IEEE Trans. Parallel Distrib. Syst.20193051055106910.1109/TPDS.2018.2887147
    [Google Scholar]
  14. MaX. YangY. Reliability modeling for cloud computing systems based on fuzzy logic.IEEE Trans. Cloud Comput.201971597010.1109/TCC.2017.2706371
    [Google Scholar]
  15. GaoJ. LiQ. LiK. Markov decision process-based reliability modeling for cloud computing systems.IEEE Trans. Cloud Comput.201974991100410.1109/TCC.2018.2830979
    [Google Scholar]
  16. XuY. WangJ. XuJ. A probabilistic model for reliability assessment of cloud computing systems considering software failures.IEEE Trans. Cloud Comput.2018641107112010.1109/TCC.2017.2774083
    [Google Scholar]
  17. WangS. ZhangX. ZhouX. Dynamic fault tree-based reliability modeling for cloud computing systems.IEEE Trans. Cloud Comput.20186257058310.1109/TCC.2017.2713038
    [Google Scholar]
  18. LiC. LiK. ZhuH. Reliability-aware task scheduling for cloud computing systems.IEEE Trans. Serv. Comput.20181161007101810.1109/TSC.2016.2639484
    [Google Scholar]
  19. MaJ. HuangD. ShenQ. A reliability model for cloud computing systems based on Bayesian networks.IEEE Trans. Cloud Comput.2017519210510.1109/TCC.2015.2493078
    [Google Scholar]
  20. FarshchiM. KimH.S. GuptaR.K. MohapatraP. Reliability modeling of cloud computing systems using Bayesian networks.IEEE Trans. Cloud Comput.20175468469510.1109/TCC.2016.2618198
    [Google Scholar]
  21. LiuC. RenJ. MaX. CuiY. LiX. Reliability-aware load balancing for cloud computing systems.IEEE Trans. Cloud Comput.20175475176210.1109/TCC.2016.2606333
    [Google Scholar]
  22. LiQ. YangY. LiZ. XiaY. Reliability modeling of cloud storage systems using Bayesian networks.IEEE Trans. Cloud Comput.20175469670710.1109/TCC.2016.2618152
    [Google Scholar]
  23. NightingaleE.B. Van RenesseR. DumitrasT. Failure trends in a large disk drive populationProceedings of the 9th USENIX Conference on File and Storage Technologies (FAST ’11)20111730
    [Google Scholar]
  24. KimY. NamK. KimS. ChungT. Analysis of storage system failures and recoveries in a large-scale production environmentProceedings of the 6th International Conference on Autonomic Computing (ICAC ’09)20091322
    [Google Scholar]
  25. VishwanathK. NagappanN. Characterizing cloud computing hardware reliabilityProceedings of the 1st ACM Symposium on Cloud Computing (SoCC ’10)201019320410.1145/1807128.1807161
    [Google Scholar]
  26. GillC. PowerR. RashidA. Understanding modern device driversProceedings of the 9th USENIX Conference on File and Storage Technologies (FAST ’11)2011 279292
    [Google Scholar]
  27. BirkeR. BrobergR. JohanssonM. On the analysis of hardware failures in large-scale distributed systemsProceedings of the 5th ACM/SPEC International Conference on Performance Engineering (ICPE ’12)2012193010.1145/2188286.2188294
    [Google Scholar]
  28. DavisN. RezguiA. SolimanH. ManzanaresS. CoatesM. FailureSim: A system for predicting hardware failures in cloud data centers using neural networks , IEEE 10th International Conference on Cloud Computing (CLOUD), Honololu, HI, USA, 25-30 June, 201710.1109/CLOUD.2017.75
    [Google Scholar]
  29. GhasemazarM. SoltanifarM. Reliability evaluation of cloud computing systems based on bayesian networks.J. Grid Comput.2015131113124
    [Google Scholar]
  30. AlmullaM. Al-BayatiM. Al-NemratA. Al-EkramiR. ReliaBolt: A machine learning framework for enhancing cloud system reliabilityProceedings of the 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)Las Vegas, NV, USA2016332337
    [Google Scholar]
  31. ChenZ. LinZ. ChenH. A novel reliability analysis method of cloud computing system based on dynamic bayesian networks.J. Grid Comput.2016142181193
    [Google Scholar]
  32. KumarN. RathS.K. PattnaikS.S. Reliability analysis of cloud computing systems using event trees2015 International Conference on Computational Intelligence and Networks (CINE)Bhubaneswar2015364010.1109/CINE.2015.8
    [Google Scholar]
  33. GhasemazarM. SoltanifarM. Reliability-aware task scheduling in cloud computing systems.Int. J. Grid High Perform. Comput.201683496310.4018/IJGHPC.2016070104
    [Google Scholar]
  34. WangS. WangC. JiaY. LiuX. Reliability-aware load balancing in cloud computing systems.IEEE Trans. Parallel Distrib. Syst.20152671870188110.1109/TPDS.2014.2341601
    [Google Scholar]
  35. ZhangZ. WangZ. LiY. A bayesian network reliability model for cloud storage systems.IEEE Trans. Cloud Comput.20153221422310.1109/TCC.2014.2350052
    [Google Scholar]
  36. SutharN. PhohaV.V. Reliability evaluation of cloud computing systems using Markov decision processes.IEEE Trans. Cloud Comput.20164213915110.1109/TCC.2015.2478923
    [Google Scholar]
  37. GargS.K. DhawanR.K. Reliability modeling and evaluation of cloud computing systems: A survey.J. Netw. Comput. Appl.20177913215010.1016/j.jnca.2016.11.020
    [Google Scholar]
  38. WangS. LiuY. XiongY. GaoY. ChenH. A machine learning approach to predicting the reliability of cloud computing systems.IEEE Trans. Cloud Comput.20175229230410.1109/TCC.2016.2516818
    [Google Scholar]
  39. ZhangJ. LiuX. WenY. LiuY. XuM. Reliability modeling and analysis of cloud computing systems: A systematic literature review.IEEE Access20186236122362510.1109/ACCESS.2018.2835058
    [Google Scholar]
  40. IslamM.A. Reliability-aware resource allocation in cloud computing systems.IEEE Trans. Serv. Comput.201811469871010.1109/TSC.2016.2581231
    [Google Scholar]
  41. TakB.C. JenaS.K. Reliability modeling of cloud storage systems: A bayesian network approach.IEEE Trans. Cloud Comput.201861708110.1109/TCC.2016.2617292
    [Google Scholar]
  42. SinghR.K. BiradarS.R. TiwariM.K. Reliability analysis of cloud computing systems using fault tree analysis and fuzzy logic.IEEE Trans. Cloud Comput.20153216817610.1109/TCC.2014.2360772
    [Google Scholar]
  43. LinY.K. ChangP.C. Estimation method to evaluate a system reliability of a cloud computing networkU.S. Patent 20120023372A1 2012.
    [Google Scholar]
/content/journals/eng/10.2174/0118722121260903231009104614
Loading
/content/journals/eng/10.2174/0118722121260903231009104614
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test