### I don't CRUNCH numbers!

As a statistician, I'm sometimes asked to "crunch the numbers". Now, I don't mean to sound sensitive, but I don't ... ahem ... "crunch" numbers. The onomatopoeic word CRUNCH suggests roughness, the application of brute force, perhaps in the form of raw computing power.

If you've seen the movie The Horse Whisperer, you'll remember how the character played by Robert Redford worked with horses. Instead of trying to "break" them, he tried to understand them and work with them. Maybe you can see where I'm going with this ...

A good data analysis requires care, patience, and understanding. It's a collaborative endeavour that should make use of subject-area knowledge wherever possible. Every number has a story to tell, and that story is not always immediately apparent. What did the researcher want to measure? How did they measure it? How did the measurement get turned into a number in a data set? And that's only the beginning, because a typical data set is the product of numerous different measurements, perhaps made on several occasions. Once the pedigree and provenance of each variable in a data set have been determined, the picture they form can be brought into focus, and the underlying patterns can be explored. Sensitivity is paramount: Are the modeling assumptions appropriate? Is something being overlooked? Would a different approach provide more relevant insights?

It seems that the term "crunching the numbers" is most commonly used to refers to what accountants do, and on this I can't comment. But for statistical data analysis, the metaphor is all wrong.

If you've seen the movie The Horse Whisperer, you'll remember how the character played by Robert Redford worked with horses. Instead of trying to "break" them, he tried to understand them and work with them. Maybe you can see where I'm going with this ...

A good data analysis requires care, patience, and understanding. It's a collaborative endeavour that should make use of subject-area knowledge wherever possible. Every number has a story to tell, and that story is not always immediately apparent. What did the researcher want to measure? How did they measure it? How did the measurement get turned into a number in a data set? And that's only the beginning, because a typical data set is the product of numerous different measurements, perhaps made on several occasions. Once the pedigree and provenance of each variable in a data set have been determined, the picture they form can be brought into focus, and the underlying patterns can be explored. Sensitivity is paramount: Are the modeling assumptions appropriate? Is something being overlooked? Would a different approach provide more relevant insights?

It seems that the term "crunching the numbers" is most commonly used to refers to what accountants do, and on this I can't comment. But for statistical data analysis, the metaphor is all wrong.

## 5 Comments:

Re: "I don't crunch numbers." Better to be accused of crunching numbers than of cooking them.

Do you "massage data?" ;-)

I whisper to them. (And when I'm lucky, they whisper back.)

Ah, but when I teach my students log and logit transformations for linear models, I tell them that "We do not just crunch the numbers, we torture them until they confess." Having just finished a series of lectures on ANOVA, I'm thinking that

dissectmight be a better verb.Yeah, to me "crunch" sounds so

generic. It seems to say that given sufficient brawn, you canalwaysget the answers. (You can also use a hammer to insert a screw, but it's not ideal.)I believe that torture is morally inexcusable, but aside from that, it is clear that it sometimes (often?) yields incorrect answers.

"Dissect" has a CSI feel to it, suggesting painstaking analysis. I'd go for that!

I believe that torture is morally inexcusable, but aside from that, it is clear that it sometimes (often?) yields incorrect answers.I think you've hit on something useful in my tortured numbers metaphor. The verb came to me when I studied factor analysis, and read about the Procrustes rotation--talk about tortured numbers! But the more I use and think about the common practice of least squares regression using transformed variables (vs. nonlinear bases), the more the torture analogy holds: incorrect answers, and methodologically (if not ethically) inexcusable.

I'd like to think I spend my time dissecting (or maybe

cross-examining) numbers, rather than torturing them. Of course, now that we've come up with the distinction, we need to develop ways to discriminate between data dissection and data torture.One of my undergrad chemistry professors often used a hammer to install wood screws, quoting his father "A screwdriver is used to take wood screws OUT."

## Post a Comment

<< Home