Skip to content
2000
Volume 18, Issue 2
  • ISSN: 2352-0965
  • E-ISSN: 2352-0973

Abstract

Background

The quality of water directly or indirectly impacts the health and environmental well-being. Data about water quality can be evaluated using a Water Quality Index (WQI). Computing WQI is a quick and affordable technique to accurately summarise the quality of water.

Objective

The objective of this study is to find strategies for data preparation to categorize a dataset on the water quality in two remote Indian villages in different geographic locations, to predict the quality of water, and to identify low-quality water before it is made accessible for human consumption.

Methods

To accomplish this task, four water quality features Nitrate, pH, Residual Chlorine, and Total Dissolved Solids which are crucial for human consumption, are considered to dictate the quality of water. Methods used in handling these features include five steps that are data pre-processing with min-max normalization, finding WQI, using feature correlation to identify parameter importance with WQI, application of supervised machine learning regression models such as Random Forest (RF), Multiple Linear Regression (MLR), Gradient Boosting (GB) and Support Vector Machine (SVM) for WQI prediction. Then, a variety of machine learning classification models, including K-Nearest Neighbour (KNN), Support Vector Classifier (SVC), and Multi-layer Perceptron (MLP), are ensembled with Logistic Regression (LR), acting as a meta learner, to create a stack ensemble model classifier to predict the Water Quality Class (WQC) more accurately.

Results

The examination of the testing model revealed that RF regression and MLR algorithms performed best in predicting the WQI with mean absolute error (MAE) of 0.003 and 0.001 respectively. Mean square error (MSE), root mean square error (RMSE), R squared (R2), and Explained Variance Score (EVS) findings are 0.002,0.005,0.988 and 0.998 respectively with RF while 0.001,0.031,0.999 and 0.999 respectively with MLR. Meanwhile, for predicting WQC, the stack model classifier showed the best performance with an Accuracy of 0.936, F1 score of 0.93, and Matthews Correlation Coefficient (MCC) of 0.893 for the dataset of Lalpura and Accuracy of 0.991, F1 Score of 0.991 and MCC of 0.981 respectively for the dataset of Heingang.

Conclusion

This study explores a method for predicting water quality that combines easy and feasible water quality measurements with machine learning. The stack model classifier performed best for multiclass classification, according to this study. To ensure that the highest quality of water is given throughout the year, information from this study will motivate researchers to look into the underlying root causes of the quality variations.

Loading

Article metrics loading...

/content/journals/raeeng/10.2174/0123520965267326231115071849
2024-01-17
2025-07-07
Loading full text...

Full text loading...

References

  1. SilvaH.A.N. RosatoA. AltilioR. PanellaM. Water quality prediction based on wavelet neural networks and remote sensing.2018 International Joint Conference on Neural Networks (IJCNN)Rio de Janeiro, Brazil, pp.1-6, 2018.
    [Google Scholar]
  2. AdelagunR.O.A. EtimE.E. GodwinO.E. Application of water quality index for the assessment of water from different sources in Nigeria.Promising Techniques for Wastewater Treatment and Water Quality Assessment2021
    [Google Scholar]
  3. Available from: https://pib.gov.in/PressReleaseIframePage.aspx?PRID=1807831 (Accessed: 20 March 2023).
  4. SuwadiN.A. DerbaliM. SaniN.S. LamM.C. ArshadH. KhanI. KimK-I. An optimized approach for predicting water quality features based on machine learning.Wirel. Commun. Mob. Comput.2022120202210.1155/2022/3397972
    [Google Scholar]
  5. Najah AhmedA. Binti OthmanF. Abdulmohsin AfanH. Khaleel IbrahimR. Ming FaiC. Shabbir HossainM. EhteramM. ElshafieA. Machine learning methods for better water quality prediction.J. Hydrol.578124084201910.1016/j.jhydrol.2019.124084
    [Google Scholar]
  6. LiL. JiangP. XuH. LinG. GuoD. WuH. Water quality prediction based on recurrent neural network and improved evidence theory: A case study of Qiantang River, China.Environ. Sci. Pollut. Res. Int.26191987919896201910.1007/s11356‑019‑05116‑y31093910
    [Google Scholar]
  7. Islam KhanM. S. IslamN. UddinJ. IslamS. NasirM. K. Water quality prediction and classification based on principal component regression and gradient boosting classifier approach.J. King. Saud. Uni.34847734781202210.1016/j.jksuci.2021.06.003
    [Google Scholar]
  8. AhmedU. MumtazR. AnwarH. ShahA.A. IrfanR. García-NietoJ. Efficient water quality prediction using supervised machine learning.Water11112210201910.3390/w11112210
    [Google Scholar]
  9. AhmadZ. RahimN.A. BahadoriA. ZhangJ. Improving water quality index prediction in Perak River basin Malaysia through a combination of multiple neural networks.Int. J. River. Basin. Manag.1517987201710.1080/15715124.2016.1256297
    [Google Scholar]
  10. LiuP. WangJ. SangaiahA. XieY. YinX. Analysis, and prediction of water quality using LSTM deep neural networks in IoT environment.Sustainability1172058201910.3390/su11072058
    [Google Scholar]
  11. SillbergC. KullavanijayaP. ChavalparitO. Water quality classification by integration of attribute-realization and support vector machine for the chao phraya river.J. Ecol. Eng.2297086202110.12911/22998993/141364
    [Google Scholar]
  12. HassanM.M. HassanM.M. AkterL. RahmanM.M. ZamanS. HasibK.M. JahanN. SmrityR.N. FarhanaJ. RaihanM. MollickS. Efficient prediction of water quality index (WQI) using machine learning algorithms.Human-Centric Intell. Sys.13-486202110.2991/hcis.k.211203.001
    [Google Scholar]
  13. HowladarM.F. Al NumanbakthM.A. FaruqueM.O. An application of water quality index (WQI) and multivariate statistics to evaluate the water quality around maddhapara granite mining industrial area, Dinajpur, Bangladesh.Environ. Syst. Res.6113201810.1186/s40068‑017‑0090‑9
    [Google Scholar]
  14. YilmaM. KiflieZ. WindspergerA. GesseseN. Application of artificial neural network in water quality index prediction: A case study in Little Akaki River, Addis Ababa, Ethiopia.Model. Earth Syst. Environ.41175187201810.1007/s40808‑018‑0437‑x
    [Google Scholar]
  15. BuiD.T. KhosraviK. TiefenbacherJ. NguyenH. KazakisN. Improving prediction of water quality indices using novel hybrid machine-learning algorithms.Sci. Total Environ.721137612202010.1016/j.scitotenv.2020.13761232169637
    [Google Scholar]
  16. DingY.R. CaiY.J. SunP.D. ChenB. The use of combined neural networks and genetic algorithms for prediction of river water quality.J. Appl. Res. Technol.123493499201410.1016/S1665‑6423(14)71629‑3
    [Google Scholar]
  17. AzadA. KaramiH. FarzinS. SaeedianA. KashiH. SayyahiF. Prediction of water quality parameters using ANFIS optimized by intelligence algorithms (case study: Gorganrood river).KSCE J. Civ. Eng.22722062213201810.1007/s12205‑017‑1703‑6
    [Google Scholar]
  18. ZhangY. GaoX. SmithK. InialG. LiuS. ConilL.B. PanB. Integrating water quality and operation into prediction of water production in drinking water treatment plants by genetic algorithm enhanced artificial neural network.Water Res.164114888201910.1016/j.watres.2019.11488831377525
    [Google Scholar]
  19. MalekN.H.A. YaacobW.F.W. NasirS.A.M. ShaadanN. Prediction of water quality classification of the kelantan river basin, malaysia, using machine learning techniques.Water1472022
    [Google Scholar]
  20. Yee WongW. HasikinK. Salwa Mohd KhairuddinA. Abdul RazakS. Farzana HizaddinH. Istajib MokhtarM. Mokhzaini AzizanM. A stacked ensemble deep learning approach for imbalanced multi-class water quality index prediction.Comput. Mater. Continua76213611384202310.32604/cmc.2023.038045
    [Google Scholar]
  21. Available from: http://cgwb.gov.in/AQM/NAQUIM_REPORT/Rajasthan/JAIPUR%20RAJASTHAN.pdf (Accessed:16 March 2023).
  22. The impact of data pre-processing techniques and dimensionality reduction on the accuracy of machine learning.9th Annual Information Technology Electromechanical Engineering and Microelectronics Conference (IEMECON)Jaipur, India, 2019, pp. 279-283.
    [Google Scholar]
  23. The effect of the normalization method used in different sample sizes on the success of artificial neural network modelInt. J. Assessm. Tool. Educ.20196217019210.21449/ijate.479404
    [Google Scholar]
  24. RitabrataR. An introduction to water quality analysis.Int. J. Environ. Rehabilit. Conserv.IX2941002018Available from: www.essence-journal.com
    [Google Scholar]
  25. TyagiS. SharmaB. SinghP. DobhalR. Water quality assessment in terms of water quality index.Am. J.Wat. Resou.133438202010.12691/ajwr‑1‑3‑3
    [Google Scholar]
  26. KothariV. VijS. SharmaS. GuptaN. Correlation of various water quality parameters and water quality index of districts of Uttarakhand.Environ. Sustain. Indic.9100093202110.1016/j.indic.2020.100093
    [Google Scholar]
  27. BiauG. FrG.B. Analysis of a random forests model.J. Mach. Learn. Res.132012
    [Google Scholar]
  28. TrunfioT.A. ScalaA. GiglioC. RossiG. BorrelliA. RomanoM. ImprotaG. Multiple regression model to analyze the total LOS for patients undergoing laparoscopic appendectomy.BMC Med. Inform. Decis. Mak.221141202210.1186/s12911‑022‑01884‑935610697
    [Google Scholar]
  29. MosaviA. Sajedi HosseiniF. ChoubinB. GoodarziM. DinevaA.A. Rafiei SardooiE. Ensemble boosting and bagging based machine learning models for groundwater potential prediction.Water Resour. Manage.3512337202110.1007/s11269‑020‑02704‑3
    [Google Scholar]
  30. GhoshS. DasguptaA. SwetapadmaA. A study on support vector machine-based linear and non-linear pattern classification.Proceedings of the International Conference on Intelligent Sustainable Systems, ICISS201910.1109/ISS1.2019.8908018
    [Google Scholar]
  31. GuptaA. JainV. SinghA. Stacking ensemble-based intelligent machine learning model for predicting post-COVID-19 complications.New Gener. Comput.4049871007202210.1007/s00354‑021‑00144‑034924675
    [Google Scholar]
  32. BerlianaA.U. BustamamA. Implementation of stacking ensemble learning for classification of COVID-19 using image dataset CT Scan and Lung X-Ray.2020 3rd International Conference on Information and Communications Technology (ICOIACT)Yogyakarta, Indonesia, 2020, pp. 148-152.10.1109/ICOIACT50329.2020.9332112
    [Google Scholar]
  33. ZhangS. Challenges in KNN classification.IEEE Trans. Knowl. Data Eng.341046634675202110.1109/TKDE.2021.3049250
    [Google Scholar]
  34. NaskathJ. SivakamasundariG. BegumA.A.S. A study on different deep learning algorithms used in deep neural nets: MLP SOM and DBN.Wirel. Pers. Commun.128429132936202310.1007/s11277‑022‑10079‑436276226
    [Google Scholar]
  35. BoatengE.Y. AbayeD.A. A review of the logistic regression model with emphasis on medical research.J. Data. Analy. Inform. Proces.74190207201910.4236/jdaip.2019.74012
    [Google Scholar]
  36. SouloukngaM.H. CobanH.H. FalamaR.Z. MbakopF.K. DjongyangN. Comparison of different models to estimate global solar irradiation in the sudanese Zone of Chad.J. Elektron. Telekomun.22263202210.55981/jet.508
    [Google Scholar]
  37. HicksS.A. StrümkeI. ThambawitaV. HammouM. RieglerM.A. HalvorsenP. ParasaS. On evaluation metrics for medical applications of artificial intelligence.Sci. Rep.1215979202210.1038/s41598‑022‑09954‑835395867
    [Google Scholar]
  38. ChiccoD. WarrensM.J. JurmanG. The matthews correlation coefficient (MCC) is more informative than cohen’s kappa and brier score in binary classification assessment.IEEE Access97836878381202110.1109/ACCESS.2021.3084050
    [Google Scholar]
  39. Comparison of water quality classification models using machine learning.Proceedings of the Fifth International Conference on Communication and Electronics Systems (ICCES 2020)2020
    [Google Scholar]
/content/journals/raeeng/10.2174/0123520965267326231115071849
Loading
/content/journals/raeeng/10.2174/0123520965267326231115071849
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test