In a big data environment, the notion of data quality that is “fit for purpose” is important. For some types of data science and analytics, raw, messy data is exactly what users want. Yet, even in this case, users need to know the data’s flaws and inconsistencies so that the unexpected insights they seek are based on knowledge, not ignorance. Spotting data quality problems can also help organizations improve how frontline processes record the data that is collected into the data lake or data hub.
As organizations grow dependent on the data they have stored in their big data repositories, or in the cloud, for a wider range of businesses decisions, they need data quality management to improve the data so that it is fit for each desired purpose. Without data quality management, that massive quantities of data that organizations are ingesting will not provide the anticipated benefits and can even do harm if used to drive faulty business decisions.