The Paradox of Self-Serve Analytics

The practical reality is that “interesting” insights often bring more doubt than clarity.

Aug 01, 2024

In today's data-driven world, teams are increasingly reliant on self-serve data tools to explore and leverage data without the help of the data team. However, a paradox emerges in the constant quest for an interesting insight. The practical reality is that “interesting” insights often bring more doubt than clarity.

This post is part of a series exploring the practical limitations of self-serve analytics. I’ll dive deeper into the technical, organizational, and psychological barriers to success and examine how different perspectives and advances in AI might help address these challenges. Subscribe to follow along. 👇

The Reality of an “Interesting” Insight

When the data shows something significantly different from our expectations, the first reaction is often doubt. This is especially true in self-serve analytics, where users may not be comfortable navigating the myriad of filters, toggles, and metric definitions. They are certainly not comfortable dissecting the complex SQL queries generated under the hood. It’s not just doubt; it’s usually self-doubt.

As a user, you have a choice: Do you engage the data team to validate your findings? If it is a result of human error, not only would you be wasting their time with a task they dislike, but you also risk appearing incompetent for misusing the tool and (even worse) for believing that this “really surprising” finding could be based in reality.

No one wants to look like they don’t have basic common sense. When someone finds themselves in this position, the easiest course of action is to trust their instincts – which tell them that the “interesting” insight is most likely the result of human error.

This is the problem with an “interesting” insight. It can’t be so surprising that it causes doubt, but it must be interesting enough to stand out from what we’d otherwise expect.

In practice, this balance is incredibly difficult to achieve. Something slightly different from the norm doesn't immediately register as an “ah-ha” moment.

For instance, in the typical self-serve analytics experience, a user starts with an existing plot from a dashboard and slices and dices it by the provided dimensions to see what stands out. When a user segments retention rate by product type, they might see that customers who bought product bundles have a 34% retention rate versus 31% for non-bundle buyers.1

Is that “interesting”? Maybe…. Kind of… Not really.

Is it something that can be used to guide the merchandising strategy? Yes, absolutely!

But a marketing manager who isn’t spending their days in the data might not recognize its value. The impact is subtle and marginally different from the norm, falling short of the typical expectation of an “insight” being eye-opening or game-changing. To the marketing manager, this isn’t “insightful,” so they move on.

Self-serve tools are in the impossible position of delivering something that doesn’t exist: a surprising trend that is in alignment with our expectations. By definition, that’s not possible.

How did we get here?

As data folks, some of us will instinctively point to “data literacy” as the problem, but I believe that this is a lazy excuse, equivalent to blaming the user for misusing the tool. Yes, I said what I said.

Even junior data analysts will mindlessly slice and dice datasets, hoping for something that “jumps out,” and overlook potentially valuable insights because they aren’t “interesting” enough. I’m not trying to pick on anyone here. Self-serve analytics tools were always designed and marketed for this behavior. The problem is that the expectation of uncovering a groundbreaking insight is unrealistic and sets the user up for failure. If we really want to get value from our data in a self-serve manner, we need to rethink the design of these products to align with behaviors that lead to useful (not necessarily “interesting”) outcomes.

“...in my experience, human error usually is a result of poor design: it should be called system error. Humans err continually; it is an intrinsic part of our nature. System design should take this into account. Pinning the blame on the person may be a comfortable way to proceed, but why was the system ever designed so that a single act by a single person could cause calamity? Worse, blaming the person without fixing the root, underlying cause does not fix the problem: the same error is likely to be repeated by someone else.”

― Donald A. Norman, The Design of Everyday Things

An Alternative Approach: Hypothesis-First Data Analysis

So, how do we move beyond this limitation? Instead of starting with the available data and trying to brute-force our way to an “insight”, we should design products that push the user to start with hypotheses.

Business teams, such as those in growth, customer success, and sales, often have a great qualitative understanding of their customers. They talk to customers, monitor social media chatter, observe website activity, and develop strong intuitions about their customers and their behavior. When these teams use data to confirm or deny their intuitions, it can lead to actionable insights.

With this framing, the user starts with a hunch. They suspect something is true and can leverage the data to inform their position. Now, this is when those marginal differences become “interesting”.

For example, a marketing manager might suspect that product bundles are better at converting first-time buyers into repeat customers. He reasons that bundles expose customers to more products, increasing their chances of falling in love with at least one of them. In this scenario, a 34% vs 31% retention rate for bundle vs non-bundle buyers is interesting. It’s interesting because it aligns with the hunch he already had. And, more importantly, it’s useful because he can leverage this for future merchandising decisions.

Now, let’s play out the opposite scenario, where there’s no difference; there’s a 32% retention rate for bundle and non-bundle customers. That is also interesting. It’s interesting because it doesn’t align with his expectation, but the data isn’t so out-of-the-realm-of-reality that he doubts its validity and dismisses it. Even better, it prompts further curiosity about why his intuition was wrong and leads to more hypotheses and more exploration.

Now, that is useful.

We see this pattern play out in other data products too. Look at A/B testing tools like Optimizely and VWO. These products are widely adopted because they offer reliable self-serve analytics. They allow product teams to determine, with reasonable certainty, whether their expectations hold true. The popularity of these tools demonstrates their effectiveness and the value they bring. You’ll notice that these solutions always start with a hypothesis.

A New Approach to Self-Serve Analytics

If the data industry embraced this hypothesis-first approach, it would bring about a substantial change in the value delivered to decision makers. Instead of starting from a data-centric perspective—first asking what data we have, then figuring out how to make it useful—we should design products from a hypothesis-first perspective. This means first leaning into the expertise and intuition of the business user, and then helping them find the data that can validate or challenge their views.

The best data analysts already use this approach. Before looking at the data, they’ll ask “What decisions are you trying to make?” “What hypotheses do you have that could inform the decision?” Only then do they think about what data could be used to inform the strategy. This approach ensures that insights are relevant, actionable, and aligned with business goals. We need to build this framework into our self-serve tools.

In the end, the most valuable insights are the ones that challenge our expectations and confirm our hunches. When paired with solid business knowledge, these insights can significantly impact decision-making. Imagine if all self-serve analytics tools allowed users to test their ideas directly, rather than just presenting raw data to sort through.

Switching from a data-centric to a hypothesis-first approach would make these tools easier to use and more effective. This could help users uncover insights that are genuinely useful and aligned with their goals. It’s time to rethink the design of self-serve analytics to better support decision-making and improve outcomes. Let’s make data work harder for us.

Don’t come @ me - let’s assume this result is statistically significant and not due to random chance 😎

Data for Doers

Discussion about this post