Skip to content
2000
Volume 17, Issue 1
  • ISSN: 2666-2558
  • E-ISSN: 2666-2566

Abstract

Background: Data quality is crucial to the success of big data analytics. However, the presence of outliers affects data quality and data analysis. Employing effective outlier detection techniques to eliminate dirty data can improve data quality and garner more accurate analytical insights. Data uncertainty presents a significant challenge for outlier detection methods and warrants further refinement in the era of big data. Objective: The unsupervised outlier detection based on the integration of clustering and outlier scoring scheme is the current research hotspot. However, hard clustering fails when dealing with abnormal patterns with uncertain and unexpected behavior. Rough boundaries help identify more accurate cluster structures. Therefore, this article uses uncertainty soft clustering based on rough set theory to extend the clustering technology and designs appropriate scoring schemes to capture abnormal instances. This solves the problem of outlier detection in uncertain and nonlinear complex data. Methods: This paper proposes the flow of an outlier detection algorithm based on Kernel Rough Clustering and then compares the detection accuracy with five existing popular methods using synthetic and real-world datasets. The results show that the proposed method has higher detection accuracy. Results: The detection precision and recall of the proposed method were improved. For the detection accuracy, it is superior to popular methods, indicating that the proposed method has a good detection effect in identifying outlier. Conclusion: Compared with popular methods, the proposed method has a slight advantage in detection accuracy and is one of the effective algorithms that can be selected for outlier detection.

Loading

Article metrics loading...

/content/journals/rascs/10.2174/2666255816666230912153541
2024-01-01
2025-01-06
Loading full text...

Full text loading...

/content/journals/rascs/10.2174/2666255816666230912153541
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test