Thursday, April 17, 2008

Lowering the bar

I was helping my daughter with some homework the other night. She had been asked to use a spreadsheet program to produce a bar chart. I believe the numbers were densities (g/cm3) and they were something like:
92.5, 91, 93.5, 92
And here's what Excel produced:
The vertical axis starts at 89.5, so the height of each bar represents the density−89.5, which means ... ??

Junk Charts quotes Naomi Robbins, author of Creating More Effective Graphs thus: "all bar charts must include zero". Indeed—otherwise what do the bar heights represent? That Excel's defaults violate this rule is, ahem, unfortunate. (I've tried this using Excel 2000 and Excel on a Mac, but perhaps it's been fixed in newer versions? Maybe?)

Excel can be coerced into starting its vertical axis at 0, but it takes a fair bit of clicking and navigating. The result is:
Relative to a density of zero, there's very little variation. But perhaps this hides the message in these numbers. Doesn't that just bring us back to the first bar chart? Well ... no.


This graph shows the data, with the vertical axis zoomed in to where the action is. Unlike the original bar chart, it doesn't show bars with arbitrary heights.

Again from Junk Charts:
The "start-at-0" rule says that the vertical axis of any graph ought to start at value 0. The rule was mentioned in Huff's classic booklet, "How to Lie with Statistics": as the name implies, the rule is intended to eradicate mischievous graphs that exaggerate small differences by not starting at 0, which is to say, by choosing a misleading scale.

Others, like Tufte and Wainer, have long realized that the start-at-0 rule is not absolute ... My own "anti-rule" stipulates that if all data appearing in a chart are far from 0, then don't start at 0.

If, on the other hand, some of the plotted data are close to 0, then it is essential to start at 0.
This isn't too far from my view, but it doesn't address bar charts, which are a special case because they emphasize the heights of the bars, rather than the position of the tops of bars. Bar charts are only appropriate for variables that are measured on ratio scales. For such variables, there is a non-arbitrary zero, which means that you can calculate a meaningful ratio; e.g. for weight: one thing might weigh twice as much as another. But some variables aren't like that; e.g. IQ: an IQ of zero is meaningless, and so it doesn't make sense to say that someone with an IQ of 100 is twice as intelligent as someone with an IQ of 50. For variables of this kind bar charts make no sense at all.

So, if your variable isn't ratio scaled (in other words, there isn't a meaningful zero), don't use a bar chart. If it is ratio scaled and you decide to use a bar chart, make sure your axis starts at zero.

Derek puts it well in a comment at Pictures of Numbers:
There is a circumstance in which the would-be grapher absolutely must start with zero, and that's when creating a bar graph. If that causes problems, it's time to consider abandoning the bar graph and adopting something which doesn't need a zero on the scale. I've seen bar graphs where the designer recognised the problem with zero, adopted and defended the solutions, but without getting rid of the bar graph format. Those wavy gaps are the least bad of the abortive compromises resorted to by people who won't give up their bars.
In case anyone thinks this really isn't much of an issue, here are some examples I found quite easily:





Labels: , , , ,

Bookmark and Share

11 Comments:

Blogger Zeno said...

Nice one, Nick. More people need to take seriously how graphs can distort as well as depict.

12:19 AM, April 18, 2008  
Blogger Nick Barrowman said...

Thanks, Zeno.

The bar chart is so simple and yet even it offers opportunities for misrepresentation. When it comes to more complex displays—and these are becoming increasingly common—the situation may well be worse.

7:59 AM, April 18, 2008  
Blogger Raywat Deonandan said...

Then there's the Pac Man Chart

9:34 AM, April 18, 2008  
Blogger Nuclear Mom said...

Great post. I was always trying to pound notions like this into my undergraduate's heads (back when I taught).

Now I just get to be a snot about it when I review journal articles! And now I have links to back my snotty self up. ;-)

I feel for my kids' future teachers... sort of.

1:13 PM, April 18, 2008  
Blogger Nick Barrowman said...

I went to a scientific talk today, and sure enough there was a bar chart just like "Figure 4" above. Zoinks! So as to avoid being too obnoxious, I waited until after the talk to point it out. But I'm afraid I was being a "snotistician".

6:56 PM, April 18, 2008  
Anonymous Anonymous said...

Just to give the author of Figure 4 a break, Excel will not allow you to put zero on an axis that is plotted on a log10-scale.

Personally, I have no problem with not starting at zero as long as the axis is labeled. My bigger issue is the lack of error bars on these plots - if the differences in the bars is meaningful, then highlighting that difference seems reasonable. If the differences are not meaningful, then it really is a problem.

1:49 PM, April 23, 2008  
Blogger Nick Barrowman said...

Excel is right not to allow a zero on an axis with a logarithmic scaling. Zero is located infinitely far down the axis.

I agree that error bars are important. In fact I blogged about this a couple of years ago. But I think error bars don't work too well with bar charts. Hmmmm ...

3:01 PM, April 23, 2008  
Blogger JSinger said...

I'm sorry, but framing this issue in such absolute terms is absurd. I use graphs to present information honestly to audiences who are capable of reading a y-axis and understanding what freaking error bars mean. I'm not going to make ugly, uninterpretable graphs because someone else uses non-zero intercepts misleadingly.

And if I get one of nuclearmom's papers to review, I'll be telling her I can't read her damn graphs!

5:11 PM, May 01, 2008  
Blogger Nick Barrowman said...

Interpretation of error bars is actually not all that easy. Typically error bars represent standard errors, but sometimes they represent standard deviations, and sometimes they represent confidence intervals. It's important to note that two confidence intervals can overlap even when there's a statistically significant difference. When samples are paired, statistical significance is even more tricky.

Although it seems reasonable to assume that when readers encounter a graph in print they will read the axis labels and figure caption, they may be much less likely to do so during a presentation.

But when it comes to a bar chart, the height of the bars should convey the magnitude of what they are depicting. In the words of Edward Tufte "The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the quantities represented."

I believe that it's possible to make beautiful, interpretable graphs while following Tufte's advice.

9:58 PM, May 01, 2008  
Anonymous Mike Dickison said...

Thanks for the shout-out to Pictures of Numbers: glad someone's reading it. I particularly liked the bar graph on a log scale above (which of course can't have a zero point, so the bar could be made any length you like, including infinite...).

Also thanks for the term "snotistician". I will have to pre-emptively use it in my next workshop.

2:40 PM, October 13, 2008  
Blogger Nick Barrowman said...

Thanks, Mike. Your work is really impressive. I hope I'll get a chance to attend one of your workshops some time!

5:13 PM, October 13, 2008  

Post a Comment

<< Home