Tuesday, November 01, 2005

I'll take Greek mathematicians for $100, Alex | by Jay

If you follow baseball at all, you've probably heard of stats-guru Bill James and the many formulas he devised for analyzing baseball. In 1980, James suggested that a baseball team's winning percentage could be predicted based on the number of runs they scored and the number of runs they allowed. He devised a formula to calculate this, and called it the Pythagorean Method due to its resemblance to the famous geometry theorem. In its basic form, it looks like this:

Expected Wins = G * PF² / (PF² + PA²)

G=Total Games; PF=Points (or Runs) For; PA=Points Against.
The method isn't perfect, but it works remarkably well. Over a 162-game baseball season, the Pythagorean method predicts the final record for most teams within three games of their actual performance.

Several statisticians have taken the Pythagorean Method and applied it to other sports, including football. However, because a football season has vastly fewer games than baseball, the statistical sample size is a lot smaller, and thus the results aren't as precise. Despite the higher margin for error, it's still an interesting benchmark to gauge a team's performance against its expected win total.

One of the things that James realized was that the value of the exponent shouldn't necessarily be fixed at 2. As you change that value in the formula, moving it up or down, you can decrease the error rate and bring the projections closer to the actual results. For the sum total of Notre Dame football the most accurate exponent seems to be 1.8.

I applied the Pythagorean Method against every Notre Dame football season from Rockne on. Hit the link for the complete table:


Some analysts like to explain the method as showing you which teams "over-achieved", but I'm not sure that nomenclature accurately reflects what the formula reveals. In my opinion, it's better to talk about teams as "over-performing" or "under-performing", because what the Pythagorean method really measures is how many games you were supposed to win based on a strict measurement of points scored and points given up; it's not a measurement of how good a team really is. Perhaps another way to talk about it is in terms of Fate: which teams were "luckiest", and which teams were snakebitten.

The funny thing is, you can spin this formula any number of ways. For instance, I'm sure no one's surprised to see Willingham's 2002 team on the Lucky end of things; that was a squad that won a lot of games with smoke and mirrors. Note, though, that Holtz's '88 team is even "luckier": wins of 19-17 and 31-30 probably had something to do with it. As you can see, the formula seems to imply less about the overall quality of the individual season, and more about the vagaries of lucky bounces, the agony of missed field goals, and the gray twilight of hard-fought, close games, where one team was just a little bit better that day.

One thing that the method does show is that teams that under-perform their point differential tend to get better the following year, and teams that over-perform tend to get worse. This doesn't always hold true, but it's pretty common. For instance, check out the ten most under-performing teams from the Rockne era onwards:

Season Coach Games PF PA Wins Win% Ex.W Ex.W% Diff. Next Yr Imprv.
1931 ANDERSON 9 215 40 6 .667 8.58 .954 -.287 .778 +.111
1965 PARSEGHIAN 10 270 73 7 .700 9.13 .913 -.213 .900 +.200
1981 FAUST 11 232 160 5 .455 7.27 .661 -.207 .545 +.091
1932 ANDERSON 9 255 31 7 .778 8.80 .978 -.200 .333 -.444
1925 ROCKNE 10 200 64 7 .700 8.86 .886 -.186 .900 +.200
1986 HOLTZ 11 299 219 5 .455 7.00 .637 -.182 .667 +.212
1922 ROCKNE 10 222 27 8 .800 9.78 .978 -.178 .900 +.100
1983 FAUST 12 316 177 7 .583 8.87 .739 -.156 .583 .000
1969 PARSEGHIAN 11 351 134 8 .727 9.35 .850 -.123 .909 +.182
1942 LEAHY 11 184 99 7 .636 8.29 .753 -.117 .900 +.264

Lots of improvement the following year. Likewise, the ten most over-performing, or "luckiest" teams usually suffered a letdown:

Season Coach Games PF PA Wins Win% Ex.W Ex.W% Diff. Next Yr Imprv.
1933 ANDERSON 9 32 80 3 .333 1.45 .161 +.172 .667 +.333
1988 HOLTZ 12 393 156 12 1.000 10.09 .841 +.159 .923 -.077
1993 HOLTZ 12 427 215 11 .917 9.30 .775 +.142 .500 -.417
2002 WILLINGHAM 13 290 217 10 .769 8.16 .628 +.142 .417 -.353
1939 LAYDEN 9 100 73 7 .778 5.74 .638 +.140 .778 .000
1998 DAVIE 12 320 248 9 .750 7.35 .613 +.137 .417 -.333
2000 DAVIE 12 353 267 9 .750 7.48 .623 +.127 .455 -.295
1954 BRENNAN 10 231 115 9 .900 7.78 .778 +.122 .800 -.100
1989 HOLTZ 13 427 189 12 .923 10.56 .813 +.110 .750 -.173
1990 HOLTZ 12 359 259 9 .750 7.71 .643 +.107 .769 +.019

How lucky will Charlie be this year? Let's look at the season to date, and extrapolate scores of the remaining games.

Right now, we're about 11 points better on offense than our opponents usually give up, and about 7 points better on defense. Applying those averages against what our remaining opponents usually do gives us this set of predictions:

Opponent Avg PA
Avg PF
ND
Opp
Tennessee 16 16.14 27 9
Navy 25.71 29.86 37 22
Syracuse 24.13 15.63 35 8
Stanford 29.14 27.71 40 20

So for the whole season, that gives us a total Points For of 401, and total Points Against of 231. Using those point totals, Pythagoras says we should expect 8 wins.

Season Coach Games PF PA Wins Win% Ex.W Ex.W% Diff.
2005 WEIS 11 401 231 9 .818 8.03
.730 +.089

Eight wins expected, but we got 9. Pythagoras would say that Charlie was somewhat Lucky. Based on what I saw against Michigan State and Southern Cal, I might beg to differ.

All told, I think the Pythagorean Method falls into the category of "amusing statistical confection" rather than hardcore analysis. I wouldn't count on this to tell you anything earth-shattering, but it is kind of fun to look at the formula and the numbers it spits out. If you're interested in some further reading on the Pythagorean Method, here are a few links.

The Football Project
Pigskin Pythagoras
Baseball Prospectus
Football Outsiders