Tuesday, August 21, 2007

The loss column (drive or fly?)

On the Baseball Today podcast, Peter Pascarelli frequently presents a team's position in the standings only in terms of loss column differential--as in, "The Dodgers are only five back in the loss column, so don't count them out."

The use of the loss column alone is always strange. Unlike the sensible games behind formula every newspaper uses, the loss column stat ignores what we usually think of as the most fundamental stat in baseball: team wins. At this point in the season, there's no reason at all to speak of the standings in terms of the loss column only, as the absurdity of saying a 60-20 team is tied with an 18-20 team easily demonstrates. It matters if you've already won a game.

I assume that the whole loss column business got started as a way to describe pennant races at the very ends of seasons. If a team is 1.5 games behind but tied in the loss column, that team still controls its own destiny, as the saying goes, and we humans love that sense of control. (The famous fact that almost everybody fears flying more than the much more dangerous activity of driving derives from the same illusion of control.)

Therefore, it makes sense that measuring races by the loss column at the end of the year became routine back when teams had to be good to compete for the playoffs. The grain of truth in the loss column idea would lie in the fact that really good teams are likely to win any given game, so if the Yankees are three back of the Red Sox with five to play, the loss column standings could have a real effect on the probability of a comeback. The closer the two teams' winning percentages are to 1.000, the more the loss column matters. The less glorious divisional races of the twentieth-century, in which the Cardinals could realistically make the playoffs with a losing record, make the conventional standings a much better measure of a race.

Thursday, August 16, 2007

The Optical Revolution in Baseball Analysis

This Slate article, brought to my notice by John Smick, has given me a sense of what I have missed by moving to the periphery of baseball analysis over the last five years or so. When I slipped from analytical wakefulness into a fog of half-awareness, my fellow contributors on the once-proud rec.sport.baseball Usenet group had followed the lead of Bill James and worked out the basics of modern statheadedness: adjusting statistics for park, league and era; understanding the value of adjusted minor league numbers; separating players' performance from their teammates' influence; and so forth. Some of the r.s.bb writers would go on to create the Baseball Prospectus and other publications. After Moneyball popularized our long-held belief that baseball's conventional wisdom rested on flawed assumptions, a few of them even worked their way into jobs with major league teams.

We knew back then that we didn't understand how to evaluate fielding very well. Whereas pitching and hitting were largely individual efforts, the team effects of fielding make judging fielders like judging football or basketball players, whose performance always depends on the actions of other players. But I thought I had a sense of where the breakthrough would lie: the statistics needed to be based on technological observations. With cameras or lasers or something (I'm, um, not an engineer), we could know exactly how players react to batted balls--how fast the players break, how fast and accurately they track the balls, how often they catch balls when they reach them. Given the amount of money floating around baseball, I wondered why some team hadn't already introduced this technology in a proprietary way.

At the end of the Slate article, author Nate DiMeo hints at "next year's rumored Gameday innovation--video cameras that cover the playing field. These cameras promise to yield important insights about the art and science of fielding." Apparently, the dream becomes reality next year, but wonderfully, the data will be public, not the jealously guarded property of certain teams, as I had imagined.

The other factor I hadn't imagined lies in the main point of DiMeo's piece: the whole-field cameras follow the introduction of similar cameras that analyze pitching. In the combination and extension of these developments lies the root of a real revolution in sports analysis, one that I predict will ultimately mean more than and largely replace the Jamesian statistical revolution.

This is the optical revolution.

The Jamesian revolution largely involved the proper adjustment of individual performance for context. Often the conventional wisdom had undervalued context: we needed to adjust statistics for park and league and era to get past the idea that a homer is a homer is a homer. Sometimes we needed to value context less: the Jamesian revolution led us to see a less fundamental difference between major and minor league numbers and to regard RBIs as essentially a measure of slugging percentage muddied by context.

While the Jamesian revolution generally refined the analysis of existing statistical observations, the optical revolution will change the means of observation and the nature of the observed. The result of the optical revolution will be the separation of performance and results; we will shift from analyzing the statistical record of games to analyzing players' performance in a hypothetical, parallel world.

In this parallel world, we will attempt to build links between players' actions and results. The optical revolution will attempt to imagine a world of baseball without chance.

We already see hints of this development in pitching statistics that have grown out of Voros McCracken's thesis that Major League pitchers have control over walks, strikeouts, and home runs, while chance dictates the results of other batted balls. McCracken and others have used that insight to measure assign to pitchers hypothetical ERAs, adjusted for luck and team defense--these statistics attempt to measure what should have happened rather than what did. This process goes beyond norming the stats for context; it involves evaluation that deliberately excludes results. These numbers, for example, that Matt Morris's relatively effective pitching for the Giants during the first half of this season was largely the result of random variation. (More of his batted balls than usual found their way to fielders' gloves.) The Giants probably capitalized on that knowledge when they unloaded Morris into that repository of all other teams' failed projects and miscalculations, the roster of the Pirates.

The optical revolution will extend this line of thinking much farther. We have always known that some balls are called strikes and vice versa; now we know which ones, and we will soon be able to adjust for those mistakes. We will be able to say how much of an excellent pitching performance arose from the pitcher, how much from defense, how much from the opposition, and how much from luck. We will be able to say how much setting up an outside curveball with an inside fastball changes the effectiveness of the curve. If a great hitter has a lousy playoff series, we will be able to say whether he simply faced great pitching or whether he failed to hit pitches he normally smokes. This last example illustrates the transformation of the Jamesian revolution: the Jamesians cast doubt on the notion of the clutch hitter by looking at aggregate statistics, but the optical revolution will enable us to see exactly how performance changes (or remains constant) as the season becomes the postseason.

The Jamesian revolution asked us to take our eyes off the ball and look at the numbers. The optical revolution will use technological eyes to follow the ball in unimagined ways.

Wednesday, August 15, 2007

Fire Joe Morgan on Fire

Here is a column I wish I'd written on A-Rod. Hat tip: David Archer.

Tuesday, August 14, 2007

The knee of the Tiger

Update, July 2008: I suspect that this post does not describe Tiger's now-famous knee injury, but I want nonetheless to acknowledge that yes, I realize how dumb it sounds now. So noted.

During Tiger Woods's victorious final round at the PGA championship on Sunday, Woods stumbled awkwardly as he pumped his fist after holing a birdie putt on the eighth hole. I was watching the round with two friends, and none of us saw anything of concern in the stumble, but the CBS announcers immediately speculated that Tiger had incurred a knee injury. (Come to think of it, they may have said ankle first, but they soon settled on knee.) For many holes afterwards, they relentlessly attributed every bump in Tiger's road to victory to his allegedly hurt knee, in spite of no limps or grimaces to bear out the theory. When Woods clinched the victory with spectacularly huge swings, the knee narrative disappeared. I found this to be an unusually clear demonstration of selection bias, and one that illustrated more general problems with sports journalism: the announcers had more incentive to set up a dramatic narrative than to evaluate the evidence in front of them. With the narrative established, they supported it at every opportunity and then wordlessly abandoned it as it proved to be nonsense.