How to Lie with Statistics

Understand the terminology., Lie with mean averages., Lie with median averages., Lie with mode averages., Lie with representational numbers.

5 Steps 4 min read Medium

Step-by-Step Guide

  1. Step 1: Understand the terminology.

    The word “average” gets thrown around an awful lot when statistical data are being discussed.

    At first glance, the term sounds straightforward enough: the average is the amount that falls roughly in the middle.

    However, there are actually few different types of averages, all of which can be misleading if not properly understood.

    The mean average is reached by adding up all the numbers in a data set and dividing them by the number of entries in the set.

    In other words, if you have the numbers 3, 3, 5, 4, and 7, the mean average can be reached by adding them together (to get 22) and then dividing the sum by 5 (since there are 5 numbers in the set).

    In this example, the mean average is
    4.4.

    The median average is the number in a data set that falls midway between the lower numbers and the higher numbers.

    Using the same data as before (3, 3, 5, 4, and 7), the median average is 4, since 2 of the numbers are lesser and 2 are greater.

    The mode average is a representation of the most common number in the data set.

    Using our example set, the mode average is 3, since it appears twice.
  2. Step 2: Lie with mean averages.

    The mean average might seem like the most foolproof of all the methods described above, but that actually isn't the case.

    This is because abnormally high or low numbers in the data set can significantly swing the average.

    To lie with a mean average, gather outlying data and use it in your equation.

    For example, imagine you survey 50 households in a neighborhood for their income.

    Most households make between $40,000 and $60,000 a year, but one household makes $5 million a year.

    When you compute the mean average, the number will be significantly higher than the “real” average income in that area, because the $5 million number is so much bigger than the others.

    In a similar way, if you had data showing that 9 people each had $1,000 in their bank accounts, but a tenth person only has $1, the median average would work out to $900.10 – almost 10% less than the most common amount.

    Reputable surveys often throw out the very highest and very lowest numbers before computing the mean average.

    However, not every survey you see in the news is reputable.

    Unless you either have access to the entire data set yourself, or see a written assurance that the outliers were removed, it's safer to assume they weren't. , The median average is actually the toughest number to “lie” with, because it can never be too high or too low compared to most data sets.

    It must lie in the center by necessity.

    However, you can use the median average to hide a very large or small number.

    For instance, if your data set is 1, 1, 2, 3, 4, 5, 3000, the median average is
    3.

    When you have an even amount of entries, you can reach the median average by finding the mean of the two entries in the middle.

    This still doesn't account for outliers.

    Beware of median averages being used to describe changes over time.

    A company that raises the price of its services by 3% every year could raise them by 20% this year and hide it by presenting a median average of 3% over the last 9 years. , In some things, mode averages are almost impossible to lie with – the average number of tickets purchased per person for a ball game, for example, is almost always going to be accurately reflected by the mode.

    Nevertheless, mode averages, too, can exclude important data, especially in smaller data sets.

    For instance, if you have a data set of all numbers ranging from 1 to 100, but the number 1 is included 3 times, 1 will be the mode average of the set, even though the mean (and in this case, more sensible) average is much closer to
    50.

    Any survey that rates on a broad scale can be manipulated to emphasize the mode.

    If you survey 100 people on a scale of 1 to 10 about their feelings on a subject, and more people rate it “10” than any other number, then even if only one more person gave a 10 rating than gave a 1 rating, 10 is the mode average. , If you have a set of data that's defined by abstract, rather than concrete numbers (for example, a customer satisfaction survey), it's almost frighteningly easy to lie with that set.

    If you ask people to rate their satisfaction on a scale from 1 to 3, that doesn't necessarily prove that customers who chose 3 are three times as happy as those who chose
    1.

    This fact is used to skew mean averages in particular, but can also be applied to median and sometimes, even mode averages.
  3. Step 3: Lie with median averages.

  4. Step 4: Lie with mode averages.

  5. Step 5: Lie with representational numbers.

Detailed Guide

The word “average” gets thrown around an awful lot when statistical data are being discussed.

At first glance, the term sounds straightforward enough: the average is the amount that falls roughly in the middle.

However, there are actually few different types of averages, all of which can be misleading if not properly understood.

The mean average is reached by adding up all the numbers in a data set and dividing them by the number of entries in the set.

In other words, if you have the numbers 3, 3, 5, 4, and 7, the mean average can be reached by adding them together (to get 22) and then dividing the sum by 5 (since there are 5 numbers in the set).

In this example, the mean average is
4.4.

The median average is the number in a data set that falls midway between the lower numbers and the higher numbers.

Using the same data as before (3, 3, 5, 4, and 7), the median average is 4, since 2 of the numbers are lesser and 2 are greater.

The mode average is a representation of the most common number in the data set.

Using our example set, the mode average is 3, since it appears twice.

The mean average might seem like the most foolproof of all the methods described above, but that actually isn't the case.

This is because abnormally high or low numbers in the data set can significantly swing the average.

To lie with a mean average, gather outlying data and use it in your equation.

For example, imagine you survey 50 households in a neighborhood for their income.

Most households make between $40,000 and $60,000 a year, but one household makes $5 million a year.

When you compute the mean average, the number will be significantly higher than the “real” average income in that area, because the $5 million number is so much bigger than the others.

In a similar way, if you had data showing that 9 people each had $1,000 in their bank accounts, but a tenth person only has $1, the median average would work out to $900.10 – almost 10% less than the most common amount.

Reputable surveys often throw out the very highest and very lowest numbers before computing the mean average.

However, not every survey you see in the news is reputable.

Unless you either have access to the entire data set yourself, or see a written assurance that the outliers were removed, it's safer to assume they weren't. , The median average is actually the toughest number to “lie” with, because it can never be too high or too low compared to most data sets.

It must lie in the center by necessity.

However, you can use the median average to hide a very large or small number.

For instance, if your data set is 1, 1, 2, 3, 4, 5, 3000, the median average is
3.

When you have an even amount of entries, you can reach the median average by finding the mean of the two entries in the middle.

This still doesn't account for outliers.

Beware of median averages being used to describe changes over time.

A company that raises the price of its services by 3% every year could raise them by 20% this year and hide it by presenting a median average of 3% over the last 9 years. , In some things, mode averages are almost impossible to lie with – the average number of tickets purchased per person for a ball game, for example, is almost always going to be accurately reflected by the mode.

Nevertheless, mode averages, too, can exclude important data, especially in smaller data sets.

For instance, if you have a data set of all numbers ranging from 1 to 100, but the number 1 is included 3 times, 1 will be the mode average of the set, even though the mean (and in this case, more sensible) average is much closer to
50.

Any survey that rates on a broad scale can be manipulated to emphasize the mode.

If you survey 100 people on a scale of 1 to 10 about their feelings on a subject, and more people rate it “10” than any other number, then even if only one more person gave a 10 rating than gave a 1 rating, 10 is the mode average. , If you have a set of data that's defined by abstract, rather than concrete numbers (for example, a customer satisfaction survey), it's almost frighteningly easy to lie with that set.

If you ask people to rate their satisfaction on a scale from 1 to 3, that doesn't necessarily prove that customers who chose 3 are three times as happy as those who chose
1.

This fact is used to skew mean averages in particular, but can also be applied to median and sometimes, even mode averages.

About the Author

J

Jacob Ryan

Experienced content creator specializing in pet care guides and tutorials.

60 articles
View all articles

Rate This Guide

--
Loading...
5
0
4
0
3
0
2
0
1
0

How helpful was this guide? Click to rate: