Showing posts with label Bill James. Show all posts
Showing posts with label Bill James. Show all posts

Wednesday, July 23, 2008

What might have (previously) been: Socks Seybold and retrograde projection

Can you project a baseball player's performance into the past?

I have an unusual interest in Socks Seybold, who played for the Philadelphia A's in the first decade of the 20th century. He was my great-great-great (I think) uncle--by marriage, so he deserves no blame for my own limitations as a player.

My man Socks was an excellent hitter. After a brief stint with the Reds in 1899, he began playing full-time for the A's in 1901. According to Jack Kavanagh, Connie Mack brought Seybold with him to Philadelphia when the American League commenced operations. In his second full season, Seybold set an American League record for home runs with sixteen, a record that would be broken by Babe Ruth 17 years later.

Seybold hit very well from 1901 to 1907, all of his seven full seasons: he was in the league's top ten every year in slugging percentage, and in the first six of those seasons, he was also in the top ten in OPS and OPS+. His career OPS+ mark (probably as good a quick indicator of hitting quality as any) of 130--that is, 30% better than league average--ties him with Roberto Clemente and Wade Boggs, and puts him ahead of the likes of Dave Winfield, Carl Yastrzemski, Eddie Murray, and Jim Rice. Obviously, his was a short career--only seven full seasons--and Kavanagh writes that "an injury in 1908 ended his major league career."

Simple enough, right? A good career shortened by injury. So it goes.

But here's the key fact: when he incurred that injury in 1908, Seybold was the seventh-oldest player in the league! He was already 37 and almost certainly at the end of his playing days. Born in November of 1870, Seybold was on the old side of 30 when he began playing full-time, already in the typical declining years for a baseball player. I don't know why didn't begin playing earlier: in those days, I imagine he might simply have gone unnoticed as a local star for years, or wanted to stay in a safer or more respectable trade until a solid professional opportunity came along.

We have lots of tools, such as similarity scores and Bill James's favorite toy, to look at a career cut short and project a hypothetical future. As far as I know, we have none that help us see what the missing first half of a career might have looked like. I can say that eyeballing the thirtysomething years of the Hall of Famers I mentioned above, only Clemente (definitely) and Winfield (possibly) look like they might have posted an OPS+ of 130 or better at the ages Seybold played.

I wonder whether a retroprojection tool might be useful not only for cases similar to Seybold's but also for examining more recent players who might have been brought up from the minor leagues too late.

Thursday, August 16, 2007

The Optical Revolution in Baseball Analysis

This Slate article, brought to my notice by John Smick, has given me a sense of what I have missed by moving to the periphery of baseball analysis over the last five years or so. When I slipped from analytical wakefulness into a fog of half-awareness, my fellow contributors on the once-proud rec.sport.baseball Usenet group had followed the lead of Bill James and worked out the basics of modern statheadedness: adjusting statistics for park, league and era; understanding the value of adjusted minor league numbers; separating players' performance from their teammates' influence; and so forth. Some of the r.s.bb writers would go on to create the Baseball Prospectus and other publications. After Moneyball popularized our long-held belief that baseball's conventional wisdom rested on flawed assumptions, a few of them even worked their way into jobs with major league teams.

We knew back then that we didn't understand how to evaluate fielding very well. Whereas pitching and hitting were largely individual efforts, the team effects of fielding make judging fielders like judging football or basketball players, whose performance always depends on the actions of other players. But I thought I had a sense of where the breakthrough would lie: the statistics needed to be based on technological observations. With cameras or lasers or something (I'm, um, not an engineer), we could know exactly how players react to batted balls--how fast the players break, how fast and accurately they track the balls, how often they catch balls when they reach them. Given the amount of money floating around baseball, I wondered why some team hadn't already introduced this technology in a proprietary way.

At the end of the Slate article, author Nate DiMeo hints at "next year's rumored Gameday innovation--video cameras that cover the playing field. These cameras promise to yield important insights about the art and science of fielding." Apparently, the dream becomes reality next year, but wonderfully, the data will be public, not the jealously guarded property of certain teams, as I had imagined.

The other factor I hadn't imagined lies in the main point of DiMeo's piece: the whole-field cameras follow the introduction of similar cameras that analyze pitching. In the combination and extension of these developments lies the root of a real revolution in sports analysis, one that I predict will ultimately mean more than and largely replace the Jamesian statistical revolution.

This is the optical revolution.

The Jamesian revolution largely involved the proper adjustment of individual performance for context. Often the conventional wisdom had undervalued context: we needed to adjust statistics for park and league and era to get past the idea that a homer is a homer is a homer. Sometimes we needed to value context less: the Jamesian revolution led us to see a less fundamental difference between major and minor league numbers and to regard RBIs as essentially a measure of slugging percentage muddied by context.

While the Jamesian revolution generally refined the analysis of existing statistical observations, the optical revolution will change the means of observation and the nature of the observed. The result of the optical revolution will be the separation of performance and results; we will shift from analyzing the statistical record of games to analyzing players' performance in a hypothetical, parallel world.

In this parallel world, we will attempt to build links between players' actions and results. The optical revolution will attempt to imagine a world of baseball without chance.

We already see hints of this development in pitching statistics that have grown out of Voros McCracken's thesis that Major League pitchers have control over walks, strikeouts, and home runs, while chance dictates the results of other batted balls. McCracken and others have used that insight to measure assign to pitchers hypothetical ERAs, adjusted for luck and team defense--these statistics attempt to measure what should have happened rather than what did. This process goes beyond norming the stats for context; it involves evaluation that deliberately excludes results. These numbers, for example, that Matt Morris's relatively effective pitching for the Giants during the first half of this season was largely the result of random variation. (More of his batted balls than usual found their way to fielders' gloves.) The Giants probably capitalized on that knowledge when they unloaded Morris into that repository of all other teams' failed projects and miscalculations, the roster of the Pirates.

The optical revolution will extend this line of thinking much farther. We have always known that some balls are called strikes and vice versa; now we know which ones, and we will soon be able to adjust for those mistakes. We will be able to say how much of an excellent pitching performance arose from the pitcher, how much from defense, how much from the opposition, and how much from luck. We will be able to say how much setting up an outside curveball with an inside fastball changes the effectiveness of the curve. If a great hitter has a lousy playoff series, we will be able to say whether he simply faced great pitching or whether he failed to hit pitches he normally smokes. This last example illustrates the transformation of the Jamesian revolution: the Jamesians cast doubt on the notion of the clutch hitter by looking at aggregate statistics, but the optical revolution will enable us to see exactly how performance changes (or remains constant) as the season becomes the postseason.

The Jamesian revolution asked us to take our eyes off the ball and look at the numbers. The optical revolution will use technological eyes to follow the ball in unimagined ways.

Thursday, May 10, 2007

A blog is born

I started this blog after composing this little rant on May 10th, 2007, so I'll date this post to that day, call it a blog, and then try to explain myself.

I just realized something: Bill Simmons, the Sports Guy, is the anti-Bill James. Like James, Simmons is an excellent writer with a great knack for popularizing his ideas. I often enjoy Simmons's columns immensely; I am always reminding myself to cut blog subscriptions, and Simmons always makes the cuts. But Simmons's reasoning is TERRIBLE. He's just a dreadful analyst in many ways, foremost among them his constant use of misleading statistics combined with reflexively anti-intellectual bashing of stat geeks (like me, it should be noted). Perhaps the next blog in my life needs to be sportsguytalkincrazyagain.blogspot.com.