Skip to content
2000
Volume 12, Issue 5
  • ISSN: 2210-3279
  • E-ISSN: 2210-3287

Abstract

With the availability of inexpensive devices like storage and data sensors, collecting and storing data is now simpler than ever. Biotechnology, pharmacy, business, online marketing websites, Twitter, Facebook, and blogs are some of the sources of the data. Understanding the data is crucial today as every business activity from private to public, from hospitals to mega mart benefits from this. However, due to the explosive volume of data, it is becoming almost impossible to decipher the data manually. We are creating 2.5 quintillion bytes per day in 2022. One quintillion byte is one billion Gigabytes. Approximately, 90% of the total data is created in the last two years. Naturally, an automatic technique to analyze the data is a necessity today. Therefore, data mining is performed with the help of machine learning tools to analyze and understand the data. Data Mining and Machine Learning are heavily dependent on statistical tools and techniques. Therefore, we sometimes use the term – “Statistical Learning” for Machine Learning. Many machine learning techniques exist in the literature and improvement is a continuous process as no model is perfect. This paper examines the influence of variance, a statistical concept, on various machine learning approaches and tries to understand how this concept can be used to improve performance.

Loading

Article metrics loading...

/content/journals/swcc/10.2174/2210327912666220617153359
2022-06-01
2025-01-10
Loading full text...

Full text loading...

/content/journals/swcc/10.2174/2210327912666220617153359
Loading

  • Article Type:
    Review Article
Keyword(s): data mining; k-distance; KNN; machine learning; Statistical learning; variance
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test