Improving Data Quality on Big and High-Dimensional Data

April 2013
American Scientific Publishers

The lack of data quality is a crucial open issue that leads into modest decisions and suboptimal processes. This is intensely vital in big data sets where the information is generally assembled quickly, at multiple scales, from heterogeneous sources, and with little concern for quality. In this article, we survey data quality mining and management approaches. We discuss the use of incremental learning approaches to improve massive calculus where only cutting-edge batches of data are required to preserve up-to-date inferred Data Quality models. We also present a framework that will be able to process big and heterogeneous data streams of biomedical data.

Rajendra Akerkar (2013): Improving Data Quality on Big and High-Dimensional Data.  In: Journal of Bioinformatics and Intelligent Control, Vol. 1, 1-08, 2013