Tuesday, August 4, 2009

Means, Medians, and the Weather

Today, a little math.

Take the sequence of numbers: 11, 12, 13, 14, 15, 16, 17, 18, 19.

The average of these numbers is 15. So is the mid-point, or median. If these numbers were in a bag, and you were to draw one out, there is equal probability that the number you drew would be above or below 15 (and a 1/9 chance that it would be 15 exactly).

If you had a very large bag, and there were equal number of duplicates of each, and you were to draw enough times (say, 400 or so), you would likely end up with a very even distribution, 44 occurrences of each. The average would be unchanged.

Now, in that same bag, let’s remove all of the 11’s, and replace them with 1’s. We’ll do the same 400 draws, but what happens to the average and mid-point?

The average drops to 13.9. However, the median remains unchanged at 15. (The sequence is now: 1, 12, 13, 14, 15, 16, 17, 18, 19). 15 is equidistant from each end.

Which means that the probability of drawing a number greater than or less than 15 remains unchanged – for every number that you draw that turns out to be larger than 15, you will probably draw a counterpart that is less than 15.

But what if, instead of telling you the sequence of numbers in the bag and the median, I just told you that I have a bag of numbers in which the average, over all of the numbers in the bag, is 13.9. Then I invite you to draw 5.

The chance that you will draw numbers greater than 13.9 is very large – 6/9, or 2/3’s, to be exact. And by drawing only 5, there is a chance that you won’t actually draw any less than the average.

So, your perception of the range of numbers in the bag would be skewed. If you were to think about it, you would properly discern that there must be some numbers less than 13.9 in the bag, and if I were to continue to allow you to draw, you might even propose (or draw!) a very low number (the 1).

Now, to an application of exactly this phenomenon.

The average temperature (as published by the NWS) in Denver in late July / early August is 88 degrees.

So, what is the chance that any given day during this period is greater than 88 degrees? Is a 91 degree day ‘above normal’?

You get points if you realized that you can’t answer the question with the data given.

You get extra points if you realized that when our local weather people come on the TV and tell you that ‘it’s going to be a few degrees above normal today, with the expected high of 91 degrees’, they are being foolish, and possibly mis-informing you.

I got to wondering, especially last year, when it seemed that a 90+ degree day in the summer was more common than a 90- degree day. I contacted one of our local weathermen, and asked what the spread of the first standard deviation was on our weather, figuring that would actually tell us much more information about what constituted an abnormally hot (or cold) day. (The first standard deviation encompasses 2/3 of the data – if it were from, say 85 to 95 degrees, you could pretty accurately predict that most of the time, the summer day-time high would be in that range, and an abnormal, by this measure, day would be one above 95 degrees, which would constitute just 1/6 of all summer days.)

His reply astounded me: “We don’t get that information from the NWS – just the daily averages and extremes. BTW, the averages are computed every 10 years over the previous 30, so currently we are using the data collected from 1971 – 2000.”

Well. There is nothing left for an inquisitive person to do but to look at the data directly. Fortunately, the daily recorded highs and lows for the last 30 years can be had from the NWS site. So, I went to work.

And what I found was interesting. Our weather more closely represents the second sequence than it does the first. Rather than a smooth distribution about the average, it has regular, but infrequent, extremely low temps (in the 70’s!). The median temperature, as a result, is about 3.5 degrees above the average. It works out that just shy of 2/3’s of all summer daytime temps are above the published average!

Imagine how this alters your perception of the region’s weather. If you knew that the median temperature was 91.4 degrees, you would know to expect that, given enough days and years, half of the summer day-time highs would be 92 or above, and you wouldn’t be surprised at a string of 93-95 degree days. (Conversely, you also wouldn’t be surprised by an equal string of 87-89 degree days.) And if you knew the full variability of the weather, you also wouldn’t be surprised by a string of 80-82 degree days, or the occasional string of 97-99 degree days.

There is a more important point buried in all of this. The computation and use of means and medians is elementary, in the sense that we all encounter it prior to secondary (9-12) school. Our teachers take the time to create examples like this to illustrate how they differ, although the real meaning (and mismeaning) is not fully explored until statistics, usually in college. But, a rough understanding is vital to our accurate perception of the world around us, and the data presented.

And, we expect people who have a vested interest in molding our perceptions (think politicians and lobbyists) would leave one or the other out.
But, our weather people, and the NWS? What do they have to gain, other than laziness? They are all college educated, they certainly had college statistics as part of their degree program, and yet – they think nothing of going on TV each and every evening and misleading us about the ‘normalness’ of today’s weather.

Which really illustrates just how vigilant we must be whenever numbers and terms like average, median, and deviation are used. As has been said, the worst lies are statistical lies, which we can see are often lies of omission – omission of large parts of the relevant data.

No comments:

Post a Comment