Data: It takes a village, but the buck has to stop somewhere
I’ve said many times: too often, an existing function implicitly assumes data responsibilities in organizations that struggle with data management (for example, in this post here). Usually, this is either the technology function or the analytics function, which only reluctantly takes it on.
I mean “organization” rather loosely. At the fundamental level, this applies even to the entire information services profession; the idea is still the same. Also, as usual, I mean “analytics” broadly to include applied statistics, data science, business intelligence, machine learning, AI, business analytics, etc.
So, exactly how does this misalignment of responsibilities happen?
Scenario 1: The technology function assumes data responsibilities
Invariably, this is simply because they are the custodians. Obviously, they are responsible for the technology that generates and/or houses the data. So as a consequence, the contents become their responsibility by default. However, often there is nothing explicit about data contents in their official job descriptions.
There is a point that almost always gets lost all around. The technology perspective of data is different from the data user perspective of data. This has little to do with technical proficiency; it applies to even the most advanced data science developers.
Instead, it has everything to do with the purpose of the technology function. Its focus is on the environment and the platforms in which the data lives and moves, on the tools used to care for the data, on the rules and logic to avoid technical errors—not on the data content. How often do technology people look at data when all the rules are met and it’s error-free?
The problem is that the rules cannot address all of the standard data quality dimensions. They cannot address questions like “is the data a reasonable reflection of the reality?” You do that only by looking at the data contents. Technology people have too many tasks in their true scope of responsibilities to be derailed by looking at data content.
Scenario 2: The analytics function assumes data responsibilities
Analytics practitioners often tacitly end up taking on the responsibilities for data. In the vast majority of these cases, this happens as a seemingly natural and logical consequence. After all, they are indeed close to the data contents, often more than anyone else in the organization. And they have the requisite hard skills.
This is simply a misuse of the fact that looking closely at the data contents is a necessary pre-condition for good data analysis. I’ve already said elsewhere that they are not data management professionals versed in all the industry practices. But the key gap with analytics-led data management is that you never know what your next data problem will be.
To analytics practitioners, data quality is a means to an end. They run into data quality issues only when they get data for specific analysis, making data management completely reactive. These are data problems you just happen to come across.
It is not trivial that a typical data analysis effort only sees a very, very small portion of your entire available data. What other risks are out there that you are not even aware of? With every data problem, people lose trust in your data, and lost trust in data is incredibly difficult to regain. In the worst case, one of these risks leads to something catastrophic, by which time it’s too late. Ignorance is not bliss.
Lack of appropriate data ownership = nothing important gets done
Someone has to be ultimately accountable—not just responsible—for everything data, somewhere. When no one is accountable, nothing important gets done; when multiple people are “accountable,” nothing important gets done just the same.
As I mentioned, the technology perspective of data is different from the data user perspective. As a consequence, data documentation from the technology perspective is different from data documentation from the data user perspective. This distinction is much like the difference between the manufacturers’ internal documentation about their cars and the owner’s manual.
The ultimate data owner’s job is to look after the interests of the data producers as well as the data users. I have come across so many organizations with very good systems documentation without any data user documentation. Why does this matter? The former may document what one expects to see in the data, but the latter documents what one actually sees in the data.
At least in my experience, the scenario of absolutely no documentation whatsoever is rare enough. In practice, the worst case is when there is only incomplete documentation of any sort, systems or otherwise. More commonly, documentation exists but not for the data user audience, leaving the users to navigate the systems documentation. Or data user documentation exists but no one knows where. As I mentioned earlier, difficulty in locating data documentation is a clear sign of data management issues. Those issues are bigger than just analytics or technology—they are issues at the organization level as a whole.
“But we don’t have data”
Your organization may obtain most of your data from third parties or have a federated data arrangement with other organizations. You are still not immune—there is data to be managed until it dies and beyond. That you adopted it or share custody of it doesn’t mean you don’t feed, nurture, and care for it.
You may think your organization does not produce data. This is very unlikely today—even I generate proprietary data as a solo consultant. In fact, I cannot think of a situation in which an organization produces no data at all.
Keep in mind that data does not have to be digital. This is an oft-lost fact in today’s push to digitization.
Where do we go from here?
Every time I discuss this with a group of technology and/or analytics practitioners, their reaction is that of relief. They have been suffering, and finally, it all makes sense for the first time.
So, how do we fix this? What are the responsibilities for those not in data management?
First, advocate for establishing a proper data function if one does not exist. Work with the leadership and HR. Start by defining the ultimate owner of everything data. You need a dedicated or at least an indisputably designated role responsible for looking after data. Then, protect that role from other more tangible or even sexy things.
This does not mean we get to wash our hands of any data responsibilities. As stakeholders, we may not be accountable in the long run. But we are all responsible for contributing to the well-being of data. We are also responsible just in general for doing the right things for the greater data good. It does take a village to raise a data child.
So, do exercise diligence with the data you do see. Specifically:
- If you are a technology practitioner: Learn as much as you can about the data content and how that relates to reality from the users’ perspective. Don’t assume that reality follows intent especially when it comes to data.
- If you are an analytics practitioner: Audit every project data as soon as you receive it. Don’t wait until you run into problems along the way. Document and communicate the results. Every project data audit you do becomes partial documentation of data quality. And learn analytics project data audit methodologies.*
- If you are a consumer of information, that is, a business leader: Resist the temptation to assign data accountability to the technology or analytics function.
Serious about being “data-driven” (whatever that means)? Data deserves more than a half-assed assignment of accountability. I can always spot a lip service from a mile away!
P.S. I run a data audit methodology workshop for analytics practitioners from time to time. Follow me on social media or sign up here for email updates.