Monday, May 05, 2008

The memory of a moment of happiness



The painting is by Ottawa artist Patty Woodyard (pwoodyard@primus.ca). And the title of this post is from Song of The Flower by Khalil Gibran:
I am a kind word uttered and repeated
By the voice of Nature;
I am a star fallen from the
Blue tent upon the green carpet.
I am the daughter of the elements
With whom Winter conceived;
To whom Spring gave birth; I was
Reared in the lap of Summer and I
Slept in the bed of Autumn.

At dawn I unite with the breeze
To announce the coming of light;
At eventide I join the birds
In bidding the light farewell.

The plains are decorated with
My beautiful colors, and the air
Is scented with my fragrance.

As I embrace Slumber the eyes of
Night watch over me, and as I
Awaken I stare at the sun, which is
The only eye of the day.

I drink dew for wine, and hearken to
The voices of the birds, and dance
To the rhythmic swaying of the grass.

I am the lover's gift; I am the wedding wreath;
I am the memory of a moment of happiness;
I am the last gift of the living to the dead;
I am a part of joy and a part of sorrow.

But I look up high to see only the light,
And never look down to see my shadow.
This is wisdom which man must learn.

Labels: ,

Thursday, April 17, 2008

Lowering the bar

I was helping my daughter with some homework the other night. She had been asked to use a spreadsheet program to produce a bar chart. I believe the numbers were densities (g/cm3) and they were something like:
92.5, 91, 93.5, 92
And here's what Excel produced:
The vertical axis starts at 89.5, so the height of each bar represents the density−89.5, which means ... ??

Junk Charts quotes Naomi Robbins, author of Creating More Effective Graphs thus: "all bar charts must include zero". Indeed—otherwise what do the bar heights represent? That Excel's defaults violate this rule is, ahem, unfortunate. (I've tried this using Excel 2000 and Excel on a Mac, but perhaps it's been fixed in newer versions? Maybe?)

Excel can be coerced into starting its vertical axis at 0, but it takes a fair bit of clicking and navigating. The result is:
Relative to a density of zero, there's very little variation. But perhaps this hides the message in these numbers. Doesn't that just bring us back to the first bar chart? Well ... no.


This graph shows the data, with the vertical axis zoomed in to where the action is. Unlike the original bar chart, it doesn't show bars with arbitrary heights.

Again from Junk Charts:
The "start-at-0" rule says that the vertical axis of any graph ought to start at value 0. The rule was mentioned in Huff's classic booklet, "How to Lie with Statistics": as the name implies, the rule is intended to eradicate mischievous graphs that exaggerate small differences by not starting at 0, which is to say, by choosing a misleading scale.

Others, like Tufte and Wainer, have long realized that the start-at-0 rule is not absolute ... My own "anti-rule" stipulates that if all data appearing in a chart are far from 0, then don't start at 0.

If, on the other hand, some of the plotted data are close to 0, then it is essential to start at 0.
This isn't too far from my view, but it doesn't address bar charts, which are a special case because they emphasize the heights of the bars, rather than the position of the tops of bars. Bar charts are only appropriate for variables that are measured on ratio scales. For such variables, there is a non-arbitrary zero, which means that you can calculate a meaningful ratio; e.g. for weight: one thing might weigh twice as much as another. But some variables aren't like that; e.g. IQ: an IQ of zero is meaningless, and so it doesn't make sense to say that someone with an IQ of 100 is twice as intelligent as someone with an IQ of 50. For variables of this kind bar charts make no sense at all.

So, if your variable isn't ratio scaled (in other words, there isn't a meaningful zero), don't use a bar chart. If it is ratio scaled and you decide to use a bar chart, make sure your axis starts at zero.

Derek puts it well in a comment at Pictures of Numbers:
There is a circumstance in which the would-be grapher absolutely must start with zero, and that's when creating a bar graph. If that causes problems, it's time to consider abandoning the bar graph and adopting something which doesn't need a zero on the scale. I've seen bar graphs where the designer recognised the problem with zero, adopted and defended the solutions, but without getting rid of the bar graph format. Those wavy gaps are the least bad of the abortive compromises resorted to by people who won't give up their bars.
In case anyone thinks this really isn't much of an issue, here are some examples I found quite easily:





Labels: , , , ,

Saturday, April 12, 2008

Food for thought

The global price of food has risen sharply over the last 18 months. This is most acutely the case with cereals. The New York Times reports that wheat has reached its highest price in 28 years. The reasons for this phenomenon seem to be broadly accepted; see for example, Paul Krugman's column or a recent presentation (pdf) by Joachim von Braun of the International Food Policy Research Institute.

Though the relative importance of the reasons is difficult to assess, the list itself seems clear (the price of oil, a growing middle class in China and India with an increasing demand for meat which requires more grain for feed, droughts likely due to climate change, Western government subsidies for biofuels like corn ethanol).

But I wonder if we shouldn't consider a different aspect of this. As the New York Times points out:
Even the poorest fifth of households in the United States spend only 16 percent of their budget on food. In many other countries, it is less of a given. Nigerian families spend 73 percent of their budgets to eat, Vietnamese 65 percent, Indonesians half.
What is wrong with our world that so many people are living so close to the edge? Hmmm ...









Update 14Apr2008: The graph below was produced using Technorati. It shows the number of blog posts (in "any language" on blogs with "some authority") containing "food crisis". Too bad most of us are at least 6 months late.

Labels: , , ,

Friday, April 11, 2008

StatLinks

The Internet makes it possible to link a dispersed community of common interest. Now there are a number of blogs that focus entirely or in part on Statistics, but they seem not to be well connected.

So I've just set up a social bookmarking website just for applied statistics, data analysis, and visualization. It's called StatLinks.

It lists links that users submit, and allows other users to vote on their relevance. Links are listed in order of popularity (or in chronological order, if you prefer).

I encourage people to visit StatLinks, to submit links that are likely to be of interest, and to pass the word! I've put a few links in to get things started. (Hat tip to Slinkset whose technology made it a breeze to set this up.)

Labels: , , , ,

Thursday, April 10, 2008

Could you keep my place in line?

Line-ups are both eminently civilized and—really annoying! The first in first out (FIFO) principle is inherently egalitarian and respect for it is a sign of social order. But there's something crazy about using our bodies as place keepers in a queue, sometimes for hours on end.

Inevitably, after waiting some time in a lineup, someone will need to step out for a while. Rather than lose one's priority in the sequence, the convention is to ask someone (a complete stranger if need be), "Could you keep my place in line?"

The language here is metaphorical and indirect. The request is not really about keeping a place. It's about promising on the return of the person to vouch to any potential challengers that indeed this particular person was previously in line at this particular point in the sequence.

The fact is, complete strangers generally do agree to "keep your place in line". And that's a further sign of civil behaviour. Maybe line ups aren't so bad after all!

I bet there are lots of good stories about line-ups. I'd love to hear some. Then we could publish a book (I'm trying to think of a queued name for it ...)

P.S. I've tried to give equal time to the different spellings lineup / line-up / line up. I really don't know which is correct. Those who wish to correct me should form an orderly line.

Labels: ,

Tuesday, April 08, 2008

Nature vs. not sure

The perennial nature-vs-nurture debate just won't go away. This is particularly true with regards to gender differences, a subject of broad interest.

I'll acknowledge my biases up front. I have long been skeptical about biological determinism. This is partly because of its historical association with racism, sexism, classism, and the eugenics movement. But it's also because, particularly in recent years, there has been a tendency to overstate the importance of genetics in explaining human behaviour. Part of the explanation for this "genohype" may be the dramatic achievements of the Human Genome Project together with the rise of the biotechnology sector. Just as the success of Darwin's theory of natural selection led to Social Darwinism, today's molecular genetics revolution has put a new wind in the sails of biological determinism.

In the scientific world, the nature-vs-nurture debate is generally accepted to be an ill-posed problem. Because the environment affects the expression of genes, it is not a question of nature versus nurture, but of nature vis-à-vis nurture. Nevertheless, the ways in which and the extent to which nature and nurture influence human behaviour remain controversial. And beliefs about this can have profound consequences.

But one thing's for certain, and that's uncertainty. Despite the way results from studies of gender differences are often portrayed, we're usually left with more questions than answers. Here I want to comment briefly on two considerations that should be borne in mind.

Does the difference matter?

It's common to read reports stating that, for example, "women perform task X better than men". What this really means is "on average women perform task X better than men, and this effect was found to be statistically significant". The magnitude of the effect may be small or large. The degree of overlap between women and men may be small or large. (And of course the study may be flawed.)

To what can the difference be attributed?

Assuming the difference is real and meaningful, we're still left with the question of whether it represents an innate biological difference or an environmental (cultural) difference. For some reason it seems that people quickly jump to the conclusion that gender differences are innate. But in most cases it is extremely difficult to sort this out. Cultural effects can be extremely subtle. As has been pointed out (by ?), the concept of "wet" wouldn't mean much to a fish.

Grist for the mill

Here are three interesting articles that touch on some of these issues. First, a review by Viv Groskop of "The Sexual Paradox: Troubled Boys, Gifted Girls and the Real Difference Between the Sexes" by Susan Pinker. Next, an interview with professor of language and communication Deborah Cameron about her book "The Myth Of Mars And Venus". Finally, a New York Times article by Elizabeth Weil about the movement for single-sex public education based on gender differences.

I've really only scratched the surface of this issue (not to mention related ones), and there's lots of stuff out there (a Google search of "gender differences" gives 2,450,000 results). Comments?

Update 09Apr2008: It seems there's an almost unlimited number of links that could be added. Here's another review of Susan Pinker's book, from the New York Times. Here's an entertaining retort to an argument about gender differences based on evolutionary psychology. And here's a piece that argues: "Nowhere do scientific findings get more mangled than when they’re about the differences between men and women." Finally, here's a conservative view on gender differences.

Update 11Apr2008: Here's a response to some of the arguments about single-sex schooling.

Labels: , ,

Friday, March 28, 2008

Upping the anti (depressant)

A paper on antidepressants by Kirsch and co-authors published last month in PLoS Medicine has received a lot of attention. The antidepressants studied are the six most widely prescribed approved between 1987 and 1999: Prozac, Paxil, Effexor, Serzone, Zoloft, and Celexa.

The Editors' Summary explains:
The researchers obtained data on all the clinical trials submitted to the FDA ... They then used meta-analytic techniques to investigate whether the initial severity of depression affected the HRSD [Hamilton Rating Scale for Depression] improvement scores for the drug and placebo groups in these trials. They confirmed first that the overall effect of these new generation of antidepressants was below the recommended criteria for clinical significance. Then they showed that there was virtually no difference in the improvement scores for drug and placebo in patients with moderate depression and only a small and clinically insignificant difference among patients with very severe depression. The difference in improvement between the antidepressant and placebo reached clinical significance, however, in patients with initial HRSD scores of more than 28—that is, in the most severely depressed patients. Additional analyses indicated that the apparent clinical effectiveness of the antidepressants among these most severely depressed patients reflected a decreased responsiveness to placebo rather than an increased responsiveness to antidepressants.
The press simplified it further. The MSNBC headline was "Antidepressants may not help many patients". The Guardian announced: "Prozac, used by 40m people, does not work say scientists".

Reactions, adverse and otherwise

There were reactions to the effect that "we've know all along antidepressants don't work" and at the other extreme "nothing could ever convince me that antidepressants don't work."

A lot of reaction came from people who believe they have benefited from antidepressants. See, for example, the comments following a summary of the study at depression.about.com.

The blogosphere had plenty of reactions: FuturePundit, Action Potential (the Nature Neuroscience blog), The MindFields College Blog, and on and on.

And the journal itself, PLoS Medicine, had an enormous number of responses to the paper.

Betta check the meta

The heart of the findings in this paper is the meta-analysis itself, and when I examined it, two things jumped out immediately. The figure below shows them both.
There's a lot to look at in the figure. The red triangles represent the results of the patients who received the antidepressant. The bigger the triangles, the more weight they receive in the analysis. Similarly, the circles represent the placebo results. The solid red curve is a model fit to the antidepressant results. The dashed blue curve is a model fit to the placebo results. The green region shows where there is a clinically important difference between the curves.

First, look at the vertical axis, labeled "Improvement (d)" and ranging from 0 to 2. This is the mean improvement in the Hamilton Rating Scale for Depression (HRSD), but it has been divided by the standard deviation. Why divide it by the standard deviation? Well this is what you might do if each study was using a different rating scale, in order to standardize things. But here it's not necessary. Each study used the HRSD, so it would be better not to standardize.

Second, if triangles represent antidepressant results and circles represent placebo results from the studies, how do they pair up? Each study has two "arms": an antidepressant arm and a placebo arm, but on the figure you can't tell which triangle belongs with which circle. This points to an important problem: the authors meta-analyzed the antidepressant arms separately from the placebo arms. But the studies were randomized controlled trials, which means that within each study the two arms are comparable. Ignoring this can introduce bias. The standard approach in meta-analysis is to compute a contrast between the two arms within each study, and then meta-analyze these contrasts.

But do either of these points make much of a difference? It turns out that they do. PJ Leonard took the trouble of rerunning the analyses using raw HRSD scores and the standard meta-analytic approach rather than the separate-arms analysis of Kirsch and co-authors, and obtained an effect about 50% larger than they did, and stronger evidence of clinical importance. Leonard also performed a regression analysis corresponding to the figure above.

Robert Waldmann has also done some interesting work on this.

Overcoming depression: there's no silver bullet

The evidence doesn't seem to support the notion that antidepressants "don't work". The overheated media response to this article was unfortunate. And that's a topic in itself.

Nonetheless, it seems that on average the effect of antidepressants is hardly overwhelming. So far there's no silver bullet for depression. Drugs can help, but so can other interventions. Including kindness and understanding.

Update: 11Apr2008 Thanks for a post on The Home for Wayward Statisticians, I found a couple more interesting links. One is by Mark Liberman on Language Log. The other is an editorial in Nature.

Labels: , , ,