Thursday, August 16, 2007

The Optical Revolution in Baseball Analysis

This Slate article, brought to my notice by John Smick, has given me a sense of what I have missed by moving to the periphery of baseball analysis over the last five years or so. When I slipped from analytical wakefulness into a fog of half-awareness, my fellow contributors on the once-proud Usenet group had followed the lead of Bill James and worked out the basics of modern statheadedness: adjusting statistics for park, league and era; understanding the value of adjusted minor league numbers; separating players' performance from their teammates' influence; and so forth. Some of the writers would go on to create the Baseball Prospectus and other publications. After Moneyball popularized our long-held belief that baseball's conventional wisdom rested on flawed assumptions, a few of them even worked their way into jobs with major league teams.

We knew back then that we didn't understand how to evaluate fielding very well. Whereas pitching and hitting were largely individual efforts, the team effects of fielding make judging fielders like judging football or basketball players, whose performance always depends on the actions of other players. But I thought I had a sense of where the breakthrough would lie: the statistics needed to be based on technological observations. With cameras or lasers or something (I'm, um, not an engineer), we could know exactly how players react to batted balls--how fast the players break, how fast and accurately they track the balls, how often they catch balls when they reach them. Given the amount of money floating around baseball, I wondered why some team hadn't already introduced this technology in a proprietary way.

At the end of the Slate article, author Nate DiMeo hints at "next year's rumored Gameday innovation--video cameras that cover the playing field. These cameras promise to yield important insights about the art and science of fielding." Apparently, the dream becomes reality next year, but wonderfully, the data will be public, not the jealously guarded property of certain teams, as I had imagined.

The other factor I hadn't imagined lies in the main point of DiMeo's piece: the whole-field cameras follow the introduction of similar cameras that analyze pitching. In the combination and extension of these developments lies the root of a real revolution in sports analysis, one that I predict will ultimately mean more than and largely replace the Jamesian statistical revolution.

This is the optical revolution.

The Jamesian revolution largely involved the proper adjustment of individual performance for context. Often the conventional wisdom had undervalued context: we needed to adjust statistics for park and league and era to get past the idea that a homer is a homer is a homer. Sometimes we needed to value context less: the Jamesian revolution led us to see a less fundamental difference between major and minor league numbers and to regard RBIs as essentially a measure of slugging percentage muddied by context.

While the Jamesian revolution generally refined the analysis of existing statistical observations, the optical revolution will change the means of observation and the nature of the observed. The result of the optical revolution will be the separation of performance and results; we will shift from analyzing the statistical record of games to analyzing players' performance in a hypothetical, parallel world.

In this parallel world, we will attempt to build links between players' actions and results. The optical revolution will attempt to imagine a world of baseball without chance.

We already see hints of this development in pitching statistics that have grown out of Voros McCracken's thesis that Major League pitchers have control over walks, strikeouts, and home runs, while chance dictates the results of other batted balls. McCracken and others have used that insight to measure assign to pitchers hypothetical ERAs, adjusted for luck and team defense--these statistics attempt to measure what should have happened rather than what did. This process goes beyond norming the stats for context; it involves evaluation that deliberately excludes results. These numbers, for example, that Matt Morris's relatively effective pitching for the Giants during the first half of this season was largely the result of random variation. (More of his batted balls than usual found their way to fielders' gloves.) The Giants probably capitalized on that knowledge when they unloaded Morris into that repository of all other teams' failed projects and miscalculations, the roster of the Pirates.

The optical revolution will extend this line of thinking much farther. We have always known that some balls are called strikes and vice versa; now we know which ones, and we will soon be able to adjust for those mistakes. We will be able to say how much of an excellent pitching performance arose from the pitcher, how much from defense, how much from the opposition, and how much from luck. We will be able to say how much setting up an outside curveball with an inside fastball changes the effectiveness of the curve. If a great hitter has a lousy playoff series, we will be able to say whether he simply faced great pitching or whether he failed to hit pitches he normally smokes. This last example illustrates the transformation of the Jamesian revolution: the Jamesians cast doubt on the notion of the clutch hitter by looking at aggregate statistics, but the optical revolution will enable us to see exactly how performance changes (or remains constant) as the season becomes the postseason.

The Jamesian revolution asked us to take our eyes off the ball and look at the numbers. The optical revolution will use technological eyes to follow the ball in unimagined ways.