How to lie with data: A satirical take on data misrepresentation

Table of Contents

In an age where data is often regarded as the new gold, understanding how to interpret and present that data has never been more critical.

Data is everywhere. From business decisions to political policies, numbers influence our perception of reality. But what if those numbers don’t always don’t tell the full story?

The manipulation of statistics and data visualization techniques can easily create a façade that can shift perceptions and even mislead audiences.

This article explores some common data misrepresentation tactics, particularly in the End-User Computing (EUC) space, and showcases the importance of clear and accurate data reporting.

The Power of Data in Decision-Making

The increasing reliance on data for decision-making across sectors is evident. From corporate strategies to government policies, data helps to shape our understanding of reality. However, with great power comes great responsibility. Misusing data can lead to misguided decisions that impact individuals, organizations, and entire economies. As data professionals, we must recognize that numbers, while objective, can be interpreted subjectively depending on how they are presented.

The misrepresentation of data is not a new phenomenon. From the early days of statistics, figures have been used to bolster arguments, sometimes at the expense of truth. One historical example includes the manipulation of census data to favor certain political agendas. In modern times, the digital age has amplified the potential for data misuse, as advanced visualization tools and analytics software have made it easier than ever to present data in persuasive yet misleading ways.

Let’s explore some common tactics used to manipulate data—and how to spot them.

Skewing the Y-Axis

With the right manipulation, a simple graph can tell two very different stories. Let’s explore how something as basic as adjusting the Y-axis can dramatically change a reader’s perception.

One of the oldest tricks in the book is to manipulate the y-axis of graphs. By altering the scale by either stretching or compression the axis for example, you can make a minor change look monumental or vice versa.

Another way to misrepresent the data is to truncate the y-axis by having a chart start just under the minimum data point instead of having the y-axis start at 0. This exaggerates differences, misleading the audience into perceiving a larger disparity than actually exists.

By truncating the y-axis, minor differences appear disproportionately large and are, in essence, exaggerated. This technique exploits the viewer’s assumption that the visual gap reflects the actual magnitude.

Especially with charts dealing with percentages, it’s common practice to have the scale run from the minimum value up to the maximum value possible. For percentages, this would most commonly be from 0 to 100%. The second chart in the first example shows the ‘real’ difference between orange and blue.

And it can get even more deceptive. What if we completely remove the Y-axis labels? Without labels, readers lose crucial context, making it impossible to gauge the true scale of differences. This ambiguity allows for even greater manipulation of perceptions.

The most obvious clue that data is being misrepresented—the Y-axis label—is no longer available because we’ve removed it. Without this reference, it’s impossible to determine whether the chart is accurately represented or deliberately misleading.

A chart without labels leaves the viewer guessing, making it easy to manipulate perceptions. Whether intentional or not, unlabeled charts are a sign that something might be amiss. Always be skeptical of visualizations that lack proper context.

When analyzing charts, be on the lookout for excluded data or abrupt start/end points. If possible, compare the presented range with the full dataset to check for missing information.

Always read the axis descriptions carefully. Pay close attention to the scale, as well as the start and endpoints, to ensure you’re seeing the full context.

Sometimes, even correctly represented charts fail to tell the whole story. Consider a chart comparing different encoding techniques: the Y-axis represents encoding speed, while the X-axis shows the size of the encoded images. At first glance, the chart may seem complete—but without a third dimension, image quality, we can’t fully evaluate which technique is best. These omissions aren’t always intentional but can still mislead conclusions.

At GO-EUC we always label our axis, and when percentages are used, our charts will always represent the whole range, from 0 to 100%. There is always a footnote for each chart indicating whether higher or lower are better if possible.

Source: Evaluating the Performance Impact of Microsoft AppLocker

The example above shows a chart representing VM CPU utilization during a research run. The Y-axis represents CPU usage, and you’ll notice that the scale runs from 0 to 100%, even though the highest recorded usage is 72%. This ensures an accurate representation of the data without exaggerating differences.

The X-axis represents time, and importantly, the timescale has not been truncated, preserving the full dataset context. In this case, lower CPU utilization is better, and we clearly indicate this in the chart’s footnote to avoid ambiguity.

By maintaining consistent and transparent visualizations, we ensure data integrity and prevent misleading interpretations

Source: Evaluating the Performance Impact of Microsoft AppLocker

There are cases however, where the upper limit is rather arbitrary as is the case in the example here. Here, we ensured that even a small spike in disk writes per second at the start of the test is clearly indicated in the chart.

By transparently displaying the full range of values, we prevent misleading interpretations and ensure that short-lived anomalies are properly contextualized rather than exaggerated or ignored.

Misaligned Axes and correlation and causality

Beyond individual charts, data manipulation can occur by pairing unrelated metrics, leading viewers to assume a connection where none exists. This is a classic example of how correlation can be mistaken for causation.

One common tactic is the use of dual Y-axes without proper context. By misaligning axes, a chart can create the illusion of a strong relationship between two unrelated variables—potentially misleading decision-makers into drawing false conclusions.

Let’s explore how these techniques distort data interpretation and what to watch out for.

Here’s an example using hypothetical data on network latency and user satisfaction over four months:

Month Network Latency (ms) User Satisfaction (Scale 1-10)
January 120 6
February 125 7
March 130 8
April 140 9

By aligning these metrics on separate axes, one could falsely suggest a strong correlation between increasing latency and user satisfaction, as shown in the graph below.

A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable. Causation indicates that one event is the result of the occurrence of the other event; i.e. there is a causal relationship between the two events.

Correlation occurs when two variables change in sync—one might increase while the other decreases, or both might move in the same direction. Causation means that one variable directly influences another. For example, an increase in CPU usage might cause a drop in performance.

Imagine interviewing 1,000 people who have played Russian roulette—and all of them survived. Based on this data, you might conclude that Russian roulette is completely safe. Clearly, this is a flawed assumption, as it ignores the cases where people didn’t survive.

This highlights the danger of misinterpreting correlation as causation—a mistake that can lead to misguided conclusions in data analysis.

Cherry picking data or data points

“Cherry picking, or the fallacy of incomplete evidence is the act of pointing to individual cases or data that seem to confirm a particular position while ignoring a significant portion of related and similar cases or data that may contradict that position.”

The term cherry picking originates from the harvesting of fruit, for example cherries. When picking fruit, the person who is picking the fruit is expected to only pick the ripe and full grown fruits from the fruit plant. This could lead to the incorrect assumption or conclusion that most or all of the fruits on the plant are in a likewise condition which can give a false impression of the overall quality of the fruit.

The metaphor evolved to describe a situation for selectively choosing the most appealing data to support an argument or desired outcome while ignoring or discarding those that do not.

Unfortunately, cherry picking data points is a very common malpractice.

Let’s take the following example, a line graph shows average CPU usage for the past week, highlighting an impressive drop after optimization. However, the chart excludes peak usage times during a critical business period when performance issues persisted.

Another example is more precarious, let’s imagine a research where ten tests were performed. In half of the tests, the data showed favorable results in whatever we were testing in the research, in the other half the results were less promising. Now with these ten results, we could either show both sides of the story or we could just focus on the favorable results or vice versa.

What we should do is add more tests to the research to determine if this will influence the overall test results, or we should present the facts as they are indeterminate.

How to Avoid Cherry-Picking Bias:

  • Increase sample size: Conduct more tests to see if results hold across different conditions.
  • Present the full dataset: Show all relevant results, not just the ones that support a specific conclusion.
  • Acknowledge uncertainties: If data is inconclusive, be transparent rather than selectively filtering results.

By being aware of cherry picking and insisting on full data transparency, we can avoid misleading conclusions and make truly informed decisions.

Conclusion

This is a bit of a deviation from our normal articles at GO-EUC and while the basic gist of this article may be a tad satirical in nature, the consequences of such practices are all too real. Not only in the EUC space, but basically everywhere, in IT but also in our daily work, data driven decisions can impact businesses, technology development, and end users. Understanding how to assess and interpret data is essential for anyone seeking to make informed and data driven decisions.

We’ve explored how simple manipulation in charts and data, like truncating axes, cherry-picking data, and misaligning metrics, can drastically change perception. These techniques are used everywhere, sometimes fully intentional and other times perhaps not intentional at all, but either way, they influence our perception and possibly our decisions in a very significant way.

As a reader, you hold the power to spot misrepresentations and demand accountability. The next time you see a chart or statistic, don’t take it at face value. Ask yourself: is this the whole picture? Stay curious, be skeptical, and demand transparency in data reporting.

This article should give you some insight into commonly used tactics and perhaps unintentional errors when it comes to data representation. We hope that with this information you have the tools to spot these fallacies. When reading articles, blog posts, and researches it is always best practice to ask yourself the following questions:

  • Is this a reputable source?
  • Are there other sources or studies that confirm or contradict these findings?
  • Are the axes properly labeled, and does the visualization show the full context of the data?
  • Are there peer-reviewed studies backing the claims made?

In academic and professional circles for example, peer reviews exist precisely to prevent the propagation of misleading or incomplete conclusions. While not all online resources undergo such peer review processes, we urge you to be curious and diligently sceptical.

At GO-EUC, we are committed to transparency and accuracy in our researches, and how we represent our data and findings in charts and conclusions.

We have internal research guidelines on how to perform the researches, how to collect and analyse data and of course how to represent data in charts. Data analysis and making conclusions based on the data analysis is never an easy task. Therefore we always have peer reviews before our articles are published. You can see who peer reviewed the articles on the top right of each article.

Misrepresentation thrives on complacency, and if you encounter any fallacies, incorrect data or any other erroneous data or conclusions, please let us know and we will do our best to fix the data, the article or update the conclusion.

Photo by Markus Winkler on Unsplash