Post: You Want Data Quality, But You’re Not Using Quality Data

Teasers

When did we first see data?

  1. Mid-twentieth century
  2. As a successor to the Vulcan, Spock
  3. 18,000 BC
  4. Who knows?  

As far back as we can go in discovered history we find humans using data.  Interestingly, data even precedes written numbers.  Some of the earliest examples of storing data is from around 18,000 BC where our ancestors on the African continent used marks on sticks as a form of bookkeeping.  Answers 2 and 4 will also be accepted.  It was mid-twentieth century, though, when Business Intelligence was first defined as we understand it today.  BI didn’t become widespread until nearly the turn of the 21st century.

The benefits of data quality are obvious. 

  • Trust.  Users will better trust the data.  “75% of Executives Don’t Trust Their Data
  • Better decisions. You’ll be able to use analytics against the data to make smarter decisions.  Data quality is one of the two biggest challenges facing organizations adopting AI.  (The other being staff skill sets.)
  • Competitive Advantage.  The quality of data affects operational efficiency, customer service, marketing and the bottom line – revenue.
  • Success.  Data quality is linked heavily to business success.

 

6 Key Elements of Data Quality

If you can’t trust your data, how can you respect its advice?

 

Today, the quality of data is critical to the validity of decisions businesses make with BI tools, analytics, machine learning, and artificial intelligence.  At its simplest, data quality is data which is valid and complete.  You may have seen the problems of data quality in the headlines:

In some ways – even well into the third decade of Business Intelligence – achieving and maintaining the quality of data is even more difficult.  Some of the challenges which contribute to the constant struggle of maintaining data quality include:

  • Mergers and acquisitions which try to bring together disparate systems, processes, tools and data from multiple entities. 
  • Internal silos of data without the standards to reconcile the integration of data.            
  • Cheap storage has made the capture and retention of large amounts of data easier.  We capture more data than we can analyze.
  • The complexity of data systems has grown.  There are more touchpoints between the system of record where data is entered and the point of consumption, whether that be the data warehouse or cloud.

What aspects of data are we talking about?  What properties of the data contribute to its quality?  There are six elements which contribute to data quality.  Each of these are entire disciplines. 

  • Timeliness
    • Data is ready and usable when it is needed.
    • The data is available for end-of-month reporting within the first week of the following month, for example.
  • Validity
    • The data has the correct data type in the database.  Text is text, dates are dates and numbers are numbers.
    • Values are within expected ranges.  For example, while 212 degrees fahrenheit is an actual measurable temperature, it is not a valid value for a human temperature.  
    • Values have the correct format.  1.000000 does not have the same meaning as 1.
  • Consistency
    • The data is internally consistent
    • There are no duplicates of records
  • Integrity
    • Relationships between tables are reliable.
    • It is not unintentionally changed.  Values can be traced to their origins. 
  • Completeness
    • There are no “holes”  in the data.   All of the elements of a record have values.  
    • There are no NULL values.
  • Accuracy
    • Data in the reporting or analytic environment – the data warehouse, whether on-prem or in the cloud – reflects the source systems, or systems or record
    • Data is from verifiable sources.

We agree, then, that the challenge of data quality is as old as data itself, the problem is ubiquitous and vital to resolve.  So, what do we do about it?  Consider your data quality program as a long-term, never-ending project.  

The quality of data closely represents how accurately that data represents reality.  To be honest, some data is more important than other data.  Know what data is critical to solid business decisions and the success of the organization.  Start there.  Focus on that data.  

As Data Quality 101, this article is a Freshman-level introduction to the topic:  the history, current events, the challenge, why it’s a problem and a high-level overview of how to address data quality within an organization. Let us know if you’re interested in taking a deeper look into any of these topics in a 200-level or graduate-level article.  If so, we’ll dive deeper into the specifics in the coming months.   

Scroll to Top
As the BI space evolves, organizations must take into account the bottom line of amassing analytics assets.
The more assets you have, the greater the cost to your business. There are the hard costs of keeping redundant assets, i.e., cloud or server capacity. Accumulating multiple versions of the same visualization not only takes up space, but BI vendors are moving to capacity pricing. Companies now pay more if you have more dashboards, apps, and reports. Earlier, we spoke about dependencies. Keeping redundant assets increases the number of dependencies and therefore the complexity. This comes with a price tag.
The implications of asset failures differ, and the business’s repercussions can be minimal or drastic.
Different industries have distinct regulatory requirements to meet. The impact may be minimal if a report for an end-of-year close has a mislabeled column that the sales or marketing department uses, On the other hand, if a healthcare or financial report does not meet the needs of a HIPPA or SOX compliance report, the company and its C-level suite may face severe penalties and reputational damage. Another example is a report that is shared externally. During an update of the report specs, the low-level security was incorrectly applied, which caused people to have access to personal information.
The complexity of assets influences their likelihood of encountering issues.
The last thing a business wants is for a report or app to fail at a crucial moment. If you know the report is complex and has a lot of dependencies, then the probability of failure caused by IT changes is high. That means a change request should be taken into account. Dependency graphs become important. If it is a straightforward sales report that tells notes by salesperson by account, any changes made do not have the same impact on the report, even if it fails. BI operations should treat these reports differently during change.
Not all reports and dashboards fail the same; some reports may lag, definitions might change, or data accuracy and relevance could wane. Understanding these variations aids in better risk anticipation.

Marketing uses several reports for its campaigns – standard analytic assets often delivered through marketing tools. Finance has very complex reports converted from Excel to BI tools while incorporating different consolidation rules. The marketing reports have a different failure mode than the financial reports. They, therefore, need to be managed differently.

It’s time for the company’s monthly business review. The marketing department proceeds to report on leads acquired per salesperson. Unfortunately, half the team has left the organization, and the data fails to load accurately. While this is an inconvenience for the marketing group, it isn’t detrimental to the business. However, a failure in financial reporting for a human resource consulting firm with 1000s contractors that contains critical and complex calculations about sickness, fees, hours, etc, has major implications and needs to be managed differently.

Acknowledging that assets transition through distinct phases allows for effective management decisions at each stage. As new visualizations are released, the information leads to broad use and adoption.
Think back to the start of the pandemic. COVID dashboards were quickly put together and released to the business, showing pertinent information: how the virus spreads, demographics affected the business and risks, etc. At the time, it was relevant and served its purpose. As we moved past the pandemic, COVID-specific information became obsolete, and reporting is integrated into regular HR reporting.
Reports and dashboards are crafted to deliver valuable insights for stakeholders. Over time, though, the worth of assets changes.
When a company opens its first store in a certain area, there are many elements it needs to understand – other stores in the area, traffic patterns, pricing of products, what products to sell, etc. Once the store is operational for some time, specifics are not as important, and it can adopt the standard reporting. The tailor-made analytic assets become irrelevant and no longer add value to the store manager.