Warning: Data May be Messier Than it Appears

Warning: Data may be messier than it appears.

Tableau did a great thing for its community by creating the Tableau Superstore. The Superstore is a sample data set users tap into as a common starting point for feature requests, customer support, training, experimenting, etc.

Much like high school chemistry however, it makes necessary assumptions to ease the learning curve. Let’s explore the most important one and what it means for your organization:

Most of our data is not in a single clean and perfect Superstore. It’s in a bunch of Stores in various shapes and sizes, and some of them aren’t that “super.” 

Much of the organizational data we need to analyze lives in enterprise systems, growing and shrinking as need be, and full of change as the organization or systems evolve. A common analogy is various streams flowing independently that need to be brought together into a man-made lake. Warning – You will hear many water analogies in this world: “lakes” and “streams” are actually loaded terms (even if we do fall prey to it ourselves here – we are Data Plumbers after all).

The key is you need smart plumbing to bring all those data streams together before you can start getting insights. Every organization needs to do this data preparation, but it is often overlooked from the trailhead. The goal of this preparation work is simply to end up with a nice rectangle of data that is easy to slice and dice for insights. Lego bricks and rectangles are also a nice analogy for structured data; think of a simple table in a spreadsheet.

Here are three tips to optimize your data preparation efforts so you can really start getting insights from all the information you gather.

1. Give end users of data access to raw data

When putting together data infrastructure, technical folks make assumptions and decisions that affect how information is represented. Much of this is necessary as some formats and systems may not be intelligible or accessible by end-users. These choices would be made quite differently by someone with a feel for how the data was going to be finally utilized.

It’s fine to have to make those decisions without end-users in the room, but then it’s imperative the end users get a view of the raw data from each source. When they hit challenges using it revisiting some of those assumptions to make an alternate decision could save countless hours on workarounds or even unlock key insights.

2. Tools that give business users a seat at the table

Despite number 1 above saying above that technical folks should make decisions as needed, as long as they make the raw data available, there is a balance. That happens when business users take a seat at the data curation table.

Yes, this means learning of the trade-offs with various technical solutions and tools and getting into the details of where it all comes from. But learning is good for you!  It will only make you a better user of the data, you can have a say in the tools and workflow chosen. Without that, you can expect a toolset and workflow that has you relying on a lot of technical assistance.

If that isn’t motivation enough, just remember, everyone around you is only getting more technically savvy.

3. Capability to iterate

Agile is all the rage in software development but much of the data layer is still “Final Sale. No Refunds”. Once a nice unified rectangle of data is produced, users usually don’t have a way of independently iterating through a different set of translations or rules to curate alternative views.

That kinda sucks.

Software developers have the luxury of less formal requirements due fast paced iterations in modern software development methodologies. But data users, who inherently generate requirements as they conduct analysis are even more suitable for and in dire need of a “Try Again” button. It kills me to hear of teams who spend weeks putting data together only to discover new requirements at the first glance of a nice common rectangle.

Expect to have to try again no matter how smart you are.

The above is a good backbone for the upstream efforts before you can start gaining insights from visualizing data. If you get it right, your investment in reports will pay off handsomely.

Share the Post:

You might also like

Scroll to Top