Data Quality 30 years on - What's changed?
My second full-time job was a data quality one back in 1989 (yep I am that old). Timberland the footwear company had massively diversified in the 80's adding clothing lines and opening their first branded store outside the USA - in London on Bond Street. The UK operation was run by two Australians who were bold enough to take a punt on this 20 year old, fresh-off-the-plane from Aotearoa with some basic "tech" skills. The job as I recall their description was to sort out their stock control system. Turns out it was a data quality role even though none of us knew it at the time. (* I have added a wee footnote about the job at the end).
Roll forward to today and data quality projects are really in vogue as organisations look to leverage the value of fit-for-purpose datasets. Our team here is often engaged directly in data quality (DQ) projects or involved in phases focused on DQ (I got sick of typing data quality already) within a data migration, data cleansing or data integration context.
So what's changed in 30 years?
As described in A Brief History of Data Governance - the era I entered the DQ space was the tail end of the application era in data governance. Data was still a bi-product of business functions, often neglected, virtually always silo'd and not seen to hold any value for a business. Many organsations in 1989 were operating without computerised processing functions so had no data assets at all - hard to get our heads around today.
Roll forward to 2019 and we still see variable levels of maturity in the data governance space. The key change to emerge this decade has been the acceptance that data is an asset. For larger organisations this has involved adoption of data governance strategies and frameworks that now form aspects of their operational processes.
The other key driver of data governance adoption more recently has been data breaches. 2019 alone has seen a litany of them occur - Docker, T-Mobile, Web.com, Macys, Disney, Air NZ, NAB, Australian National University - to name just a few. Data breaches are effecting share value / stock prices of listed companies, this small study indicates an average loss of 7.27% . Data breaches are becoming so common and material they can no longer sit in the domain of firewalls and security teams alone. Protection of data has matured to become an element of data governance no organisation can afford to ignore.
Brackstone's Data Quality Dimensions
The real revolution in DQ occurred in 1999 when Gordon Brackstone of the department of Statistics Canada published "Managing Data Quality in a Statistical Agency". While his perspective was grounded in the context of a national statistics office the introduction of defined dimensions for DQ was applicable for all contexts.
This table changed my life in 2000 - I was distracted with Y2K like everyone else in the data field in 1999 - so took me a wee while to catch up. Having a framework like this to work with brought both maturity and a much needed common language to the DQ space.
Somewhere along the line someone - I don't know the lineage so might have been Brackstone himself - replaced the words "statistical information" with the word "data" and paraphrased the above. Then wow! the 6 Dimensions of Data Quality became the thing we have all come to love.
Below = Brackstone's dimensions modified for general context usage
Tools, Tools, Tools
Type "best data quality tool" into your search engine of choice and you will see 300M+ results. Back in 1989 SQL was my only tool in the DQ battle. During the 90's Oracle, SAS and IBM tools all entered my life. Taking a look at the latest Gartner Magic Quadrant on Data Quality (thanks Informatica who are the leader and posted a version) it's kinda sad to see the same "big boys" dominating the market here.
There are new players and opensource options emerging which is great. As a consulting business we tend to use the tool our customers have already invested in but do love to put others through their paces. Reading loads of papers on the topic over the years we have honed a shorter list of features to start assessing DQ tools against:
- Data profiling functions
- Data quality functions like cleansing, standardization, parsing, de-duplication, matching, hierarchy management, identity resolution
- User-specific interfaces/workflow support
- Integration capability
- Data cleansing, enrichment and removal functions
- Data distribution and synchronization with data stores
- Definition of metrics, monitoring components
- Data Lifecycle Management
- Reporting components, dashboarding
- Versioning functionality for datasets, issue tracking, collaboration
What will we see in the next 30 years?
There is a raft of information out there predicting how the future of data including data quality will develop. Given how much this space has changed in 30 years and the rapid pace of change in technology it is difficult to see beyond a shorter horizon so this is all pure speculation on my part:
Data quality operationalised into every aspect of data management - it will become "the magic" that just happens without separate interventions - automated cleansing, monitoring and profiling for instance.
Metadata management maturity - beyond the domain of IT departments and the scope of technical metadata into the hands of the business. Business metadata is already emerging but will become mainstream, designed to enhance insights, improve processes, gain competitive advantage and increase productivity / profits.
The elimination of the Human Error factor - this is a bold prediction I know! the simple fact is as software matures there should be no room for humans to make typo's whether data entry is replaced by voice entry as some predict (and therefore the machines translate so no typo's) or sophisticated automation mechanisms are deployed, in 30 years data quality will be the domain of merging and transformation rather than fixing human introduced "mistakes".
I guess the key here is Data Quality as a function is going to become more important than ever, will evolve and transform rapidly now. I look forward to watching this space with interest.
Ngā mihi, Vic
*The Timberland job was amazing. I learnt so much, met famous people like the Petshop Boys and Neneh Cherry, got free samples and discounted clothing and worked with an amazing team. A great start to my OE experience data and software wise as well.
Victoria spends much of her time focusing on Digital Inclusion, Digital Literacy and Digital Rights. You can read her OptimalBI blogs here, or connect with her on LinkedIn. Reposted with kind permission.
You must be logged in in order to post comments. Log In