Thursday, December 20, 2007

Shoeless Joe

In a recent article on the Mitchell report, Thomas Boswell, who should know better than to get baseball history from Field of Dreams, writes,

Shoeless Joe Jackson, an illiterate outfielder who hit like a demon in the 1919 World Series, but neglected to blow the whistle on his crooked teammates, died with his good name as black as their Sox.

Side point: why is "illiterate" in this sentence? Hypothesis: it's there to distract us from the plain facts of what Joe Jackson did in 1919.

May I volunteer an alternative version of Boswell's sentence? I may? Super!

Shoeless Joe Jackson, an outfielder who took money from gamblers to throw the 1919 World Series, performed dramatically differently in the straight and crooked games, and later described--under oath and in detail--how he contributed to throwing the Series, subsequently died.

Wednesday, November 7, 2007

Component ERA is on the traditional side for this guy?

I'm going to wind my way to a comment on the new GM of the Pirates, Neal Huntington.

I inherited a rooting interest in two baseball teams, the Giants and the Pirates, which are the hometown teams of my father and mother, respectively. I've always pulled for those teams aside from my one rebellious stage, the Great Yankee Apostasy of the last 70s, about which the less said the better.

The GYA ended during Game Four of the 1979 World Series, a game six-year-old I watched with my dad from the very last row of the upper deck in Three Rivers Stadium. The Pirates lost the game, but the crowd swept me into a durable affection for the team, even after the days of Willie Stargell (my favorite player for the rest of childhood) had long faded away.

The Pirates began and sustained their ongoing, record-shattering streak of losing seasons by combining a small-payroll strategy with traditionalist management. That is, they spent limited resources on the commodities that the market of the time most overvalued. Hence the signings of Pat Meares, twice, Derek Bell, Charlie Hayes, and others of their ilk. During the 1990s, the main online discussion board for Pirates fans turned into a place where statheaded fans gathered to savage then-GM Cam Bonifay and to imagine what a more enlightened GM would do with a player like Aramis Ramirez. Here is a typical conversation from 1999 in which I participated.

The Pirates were extremely slow to give Ramirez a job, and they never appreciated him. In 2003, they ended the Cubs' famous string of terrible third basemen by handing them a 25-year-old Ramirez for Jose Hernandez and a couple of minor-league nobodies. Oh, and the Pirates threw in Kenny Lofton. Ramirez has been a well above-average hitter ever since. Though he did play oddly well for the Dodgers the following year, Hernandez would never again play 100 games in a season.

In those days, the Pirates drove statheads crazy by failing to appreciate basic statheaded principles: the value of plate discipline, the value of minor league performance, the importance of a player's age, and so forth. Things have never gotten much better.

But now, lookee here. The new GM says,

We are going to utilize several objective measures of player performance to evaluate and develop players. We'll rely on the more traditional objective evaluations: OPS (on base percentage plus slugging percentage) , WHIP (walks and hits per inning pitched), Runs Created, ERC (Component ERA), GB/FB (ground ball to fly ball ratio), K/9 (strikeouts per nine innings), K/BB (strikeouts to walks ratio), BB%, etc., but we'll also look to rely on some of the more recent variations: VORP (value over replacement player), Relative Performance, EqAve (equivalent average), EqOBP (equivalent on base percentage), EqSLG (equivalent slugging percentage), BIP% (balls put into play percentage), wOBA (weighted on base average), Range Factor, PMR (probabilistic model of range) and Zone Rating.

Zoikes! For the first time this century, I'm very interested in what the Pirates will do next.

Sunday, November 4, 2007

Fixing revenue sharing

Michael "not the Moneyball guy" Lewis has a column in the New York Times proposing a reform for revenue sharing that would punish teams for lazy freeloading. Such a reform seems essential to me: the free rider problem has become grotesque in some cases, and not just in baseball. I wish Lewis painted the problem a little more vividly, and it's hard to tell whether his formula gets a solution exactly right, but he certainly seems to be talking sense. I hope many more writers join in the conversation.

Wednesday, October 31, 2007

The power of Scott Boras

OK, one more post roughly related to Alex Rodriguez and his contract--

Tyler Cowen at Marginal Revolution wonders how Scott Boras might be able to command higher prices for his clients than other agents do. J. C. Bradbury is skeptical of this power. Such skepticism is to be expected from economists, who would be surprised to see a single actor fundamentally change the dynamics of a competitive market as Boras is supposed to do, but I don't think you can seriously dispute that Boras has fundamentally shifted prices at times, especially in the amateur draft.

Cowen lists some mechanisms by which Boras might beat his market, but he neglects what I consider the most interesting possibility: that Boras actually makes his players better. This recent story in ESPN the Magazine describes the ways in which Boras tries to increase the skill and durability of his players. An ability to increase durability seems plausible to me, and if it seems plausible to owners, it may well cause them to pay more for Boras's clients. In the case of A-Rod, durability is a crucial factor, arguably the crucial factor, even a decisive one: if he stays healthy, he will almost certainly become the home run king. I can easily imagine Boras's longtime management of Rodriguez's training regimen being worth millions of dollars to a team.

Monday, October 29, 2007

A-Rod's Contract Again

I usually agree with Rob Neyer, but I don't agree with this blog post, where Neyer contends that Alex Rodriguez isn't worth the money he's now earning or will earn. As evidence that A-Rod is overpaid, Neyer cites Nate Silver's study of the first years of A-Rod's present contract.

I see three problems with Neyer's case.

First, and most trivially, Neyer cites the contracts of Rodriguez, Mike Hampton, and Manny Ramirez as regrettable decisions by irrational owners: "All three franchises, within just a few years, regretted those deals. Terribly regretted those deals." Sure, Hampton's deal was a disaster, but doesn't it seem a little nuts to criticize Boston for the Ramirez deal in the very week that the team wins its second World Series? I mean, I discount the meaning of postseason performance as much as anyone, but it's hard right now to imagine a better way for Boston to have spent that cash.

Second, the Neyer/Silver argument may be outdated. Neyer doesn't account for the increasing revenues in MLB. The increases may not be enough to change the big picture, but they need to be accounted for.

But the more fundamental problem with Neyer's argument is that he's making a case about a market that simply doesn't exist. Baseball owners don't get to sign free agents on the basis of Silver's calculations of their value. The asking price of free agents is (give or take) the amount of the richest competing offer plus a little bit. In such a market, top free agents will always and necessarily command more than their demonstrable value, while top young players in the present salary structure receive less. The owner who offers the Silver-Neyer price for free agents simply won't sign any of them. The rational price is above the median assessment of a player's demonstrable value. That is, the right price is what Neyer would wrongly call an irrational one.

The interesting quirk of this situation, however, is A-Rod's act of opting out of his present contract, which costs the Yankees $23 million. Avoiding that loss should be worth a lot to the Yankees; they could rationally pay, say, $20 million more than A-Rod's free agent price to extend him. The fact that A-Rod appears to be turning down a contract extension means a) he's bluffing, b) he really doesn't want to play for the Yankees anymore, or c) he and the Yankees are each betting on evaluating the free agent market better than the other. I'm guessing A-Rod wins that bet.

Friday, October 26, 2007

Undefeated seasons and aligning incentives

The Sports Guy has written recently about the relative probabilities of going undefeated in the NFL and in a given fantasy football league. Simmons skips the obvious historical approach--getting the a fantasy stats service to tell him how many teams go undefeated and comparing the incidence with the NFL's history--but he offers good reasons for thinking the undefeated fantasy season the rarer achievement. I'll add a couple of thoughts about the role of incentives in the comparison.

Many of Simmons's points boil down to the simple fact that fantasy results are hard to control due to misaligned incentives. If the Patriots are winning by three touchdowns and your fantasy team needs Tom Brady to through for two more, you're out of luck because Brady doesn't care what you need. His incentives are different from yours. Incidentally, this scenario demonstrates why I think fantasy baseball is a better pretend sport than fantasy football: in baseball, Manny Ramirez is going to try to hit well whenever he comes to the plate. His incentives are aligned with his fantasy owners' because there's no way to run out a clock.

(Side note: the latest Nobel prize in economics was awarded for work on mechanism designs that maximize incentive alignments. Here is one explanation of the work.)

OK, so the point is that misaligned incentives make fantasy football tougher to control. But there's also a contrary influence of incentives. In most fantasy football leagues, every team is trying to win a given year's championship. In the NFL, some teams are trying to win the Superbowl, but many of them are looking at least partly to the future, some are in full rebuilding mode, and a few are coasting along on low salaries to soak up guaranteed profits through revenue sharing. Therefore, the NFL is guaranteed to have unbalanced resources, with a handful of really good teams standing in the way of any undefeated season. It would be much easier to sweep a league that disbanded every team each year.

How do these variously misaligned incentives shake out to answer Simmons's question? I don't know. I'd love to see some data.

Wednesday, October 24, 2007

The Rockies' momentum

I've loved watching the Rockies win game after game lately. I have not loved media speculation about whether the layoff between the team's last playoff game and the World Series will break their momentum. Momentum plays a big role in commentary about sports, but statistical analyses have found consistently that independence generally trumps momentum. Thomas Gilovich's wonderful debunking of the basketball "hot hand" in How We Know What Isn't So is a great early example.

The streak has demonstrated that the Rockies are a better team than almost anybody imagined a couple of months ago. They have higher real and Pythagorean winning percentages, and their terrific pitching staff seems to have been solidified by mid-season shifts in the roster and role assignments. The team will now face its toughest opponent and toughest arena. To think of momentum or its dissipation as the central issue of the Series is a distraction. The Series will be determined by the quality of the teams and the quirks of short-series baseball, not whether the Rockies have continued to please the God of Momentum.

Tuesday, October 23, 2007

Peer effects in golf

Tim Harford posts about a new piece of economics research finding no peer effects in professional golf tournaments. That is, according to the researchers, golfers aren't affected by the quality of their playing partners in tournaments.

I've long been skeptical of peer effects in golf. But the general finding doesn't (as far as I know) address the key specific question: does being paired with Tiger Woods on Sunday hurt other golfers? Just about everyone thinks so, but I'm skeptical because there's a simple explanation for the appearance of a Tiger Effect.

That explanation is this: intimidation aside, Tiger is the best golfer in the world. Therefore, if he and another player are contending to win a tournament, the other player's performance is by definition more of an aberration than Tiger's. (Any player who is tied or nearly tied with Tiger on Sunday has overachieved relative to Tiger.) Therefore, if Tiger and his playing partners do what we would normally expect of them, they would create the sense of Tiger intimidating the other players into Sunday collapses. Utterly ordinary expected performances would create the same Tiger Effect that golf analysts and fans now perceive.

Obviously, this reasoning does not disprove a real Tiger Effect. But any test of the effect should account for this explanation, the fact that courses generally get harder on Sunday, and other reasons why Tiger's playing partners may not be wilting but simply finding their level on Sundays.

Thursday, October 18, 2007

A guy who ought to know

ESPN the Magazine leads off an article in its issue of October 22 thusly:

Recent history says the team popping the bubbly this season won't be the best record. So is it all luck? The Mag's Buster Olney asks a guy who ought to know: Braves pitcher John Smoltz.

What amazes me about this and many similar formulations is that writers seem to go out of their way to say, in essence, "This is a question that fall squarely in the province of statistical analysis rather than observation"--in this case, to frame the question as one of the relationship between probability and uncertainty--and then tumble directly into personal anecdote.

In next month's magazine, I hope to see the same logic in the other direction:

What does it feel like to take the mound with your team's season on the line and tens of thousands of hostile fans burying you in boos? The Mag asks a guy who ought to know: Louisiana Tech Professor James J. Cochran.

Sunday, October 14, 2007

Livan Hernandez and myths of the postseason

I've been a Giants fan for a long time: my first sentence was "Go Giants, beat Reds." Therefore, I remember all too well the lesson I learned in 2002: when you ask the gods to do something, you'd better watch our for ironic compliance.

In 2002, I had been arguing for years that there was no reason to think that Barry Bonds, then widely regarded as a postseason choker, was a different player under pressure. Here is a post I wrote in 1997 to that effect; there were many others. In 2002, Barry got his big chance to play in the postseason again, and he was ridiculously great: excellent in two playoff series, then about as good as anyone had ever been in a World Series: .471/.700/1.294.

While the gods granted my request to demonstrate that we should base postseason expectations on regular season performance, however, they also had Livan Hernandez prove the point in the other direction. Hernandez then enjoyed a reputation as a tremendous postseason pitcher, and indeed, he had pitched a two brilliant playoff games early in his career. But he had been declining as a pitcher, and though his postseason W-L record and ERA had held up, his supporting statistics had collapsed; the Livan pitching for the Giants was clearly not the Livan of 1997. In 2002, Livan pitched a solid game in the first series, a very shaky but lucky game in the second (6.1 IP, 10 baserunners, 0 strikeouts, 2 ER), and two utterly disastrous games in the World Series.

He pitched less than six innings total in the two games and gave up nine earned runs. Surely, thought I, this is the end of his reputation as a postseason force.

But it wasn't. As the Diamondbacks entered this postseason, the talk started again: don't pay attention to the regular season numbers, we heard, because Livan has another gear in October!

Indeed, Hernandez had a good postseason W-L record, but that was more a function of luck and run support than excellence: his regular season and postseason ERAs were nearly identical. And the best part of that postseason record came a full decade ago, when he was a much better pitcher in all situations than he is now.

But none of this stops's Mark Simon from saying, in a blurb that can't be linked directly, that Livan is "one of baseball's best postseason pitchers":

It will be up to one of baseball's best postseason pitchers to try to cool off the Rockies in Game 3 of the NLCS on Sunday, with Livan Hernandez trying to get the Diamondbacks a desperately needed victory. Hernandez doesn't exactly have the best history at Coors Field, but he's been known to dial it up a notch when it counts.

Anything can happen in a small sample, but this is a myth that deserves to die.

Saturday, October 13, 2007

Sports Guy gets his stathead on

In response to a reader question about teams moving to a short rotation in the baseball playoffs, Bill Simmons writes,

SG: Graham, that's a fantastic question. I don't have an answer for you. The three-day rest thing only seems to work when you don't have another choice (like the Red Sox in 2004, for example). If it's a conscious decision, the results always seem to be brutal. But I have another question: Why is everyone always so confident that sinkerballers are better on three days rest? People just spout this out like it's a foregone conclusion -- oh, yeah, it's fine when Wang pitches on three days rest, he's a sinkerballer. It is? Who said? Do we have scientific proof that it's better for any pitcher (even someone with a specialty pitch like the sinkerball) to be more tired than less tired? I'm dying for them to tackle this on "MythBusters."

That's a great response from Simmons; I want only to add that this case offers an excellent demonstration of the power of the optical revolution in baseball analysis. We suddenly have the power to know exactly what happens to sinkerballers after three or four days of rest, to see whether any effect correlates with the amount of sink on pitches, and to do all of this with a direct measurement of the pitches' break instead of inferring that measurement from fly ball/ground ball ratios. Amazing.

Tuesday, October 9, 2007

A-Rod's contract

Everybody seems to acknowledge that if Alex Rodriguez opts out of his current contract to negotiate a new one, he'll make more money. In other words, he is currently underpriced. Every story about his contract also includes the fact that the Rangers are scheduled to pay the Yankees $3M/year of that contract if A-Rod sticks with it. I haven't seen any story draw this conclusion: the Rangers made a trade that has them paying the Yankees to employ baseball's best player at a bargain price.

February 2004 is not that long ago, after all: "Texas will pay $67 million of the $179 million left on Rodriguez's $252 million, 10-year contract, the most cash included in a trade in major league history." Well played, Rangers! No wonder you're so good!

Monday, October 8, 2007

Shining when the lights are brightest

I was going to write recently about Livan Hernandez's undeserved reputation for postseason excellence, but I just heard an even crazier example. Indians GM Mark Shapiro just said on the Baseball Today podcast that Kenny Lofton is a proven postseason performer who shines when the lights are brightest and whatnot.

No. The only good thing you can say about Lofton in the postseason is that he's gotten to the playoffs a number of times. In fact, I hypothesize that any player who makes the playoffs with a bunch of different teams and manages not to be memorably awful will gain a reputation for clutch postseason play.

But here (scroll down) is the real story of Lofton's playoff performance: 19 series over 11 years, with a solid sample of 360 at-bats, producing at a clip of .253/.323/.353 including the current hot streak. He's been putrid.

Sunday, October 7, 2007

The last day of the regular season

The extraordinary lack of close series in the playoff so far make me recall one of the many components of the great drama of the end of the NL regular season: Jayson Stark wrote for that the Philadelphia crowd gave the visiting scoreboard a standing ovation when it posted the Marlins' seven-run first inning against the Mets. On that Sunday afternoon, two crowds--those in Philadelphia and Colorado--had the extraordinary pleasure of watching the loss of the key rival go up on the scoreboard while the home team in front of them won the crucial last game. That's a tough combination to beat.

Torre's job

Many media reports today say that George Steinbrenner will fire Joe Torre if the Yankees lose their current series against the Indians. I dislike the Yankees anyway, for most of the usual reasons, but such stories add fuel to the fire. The result of a short series in baseball conveys almost no meaningful information about a manager's performance, especially when the manager in question has experienced great postseason success and disappointment by turns. I can think of a few legitimate reasons to fire Torre, but Stenbrenner's would be an especially stupid one.

Tuesday, October 2, 2007

Shape stats and consolidation stats

As has been widely noted in the sports media, Jimmy Rollins last weekend became only the fourth Major League Baseball player to join something called the 20-20-20-20 club: at least twenty doubles, triples, home runs, and stolen bases in a season. I think that's great. I've liked Rollins since I followed him through the Phillies system, including catching him alongside Pat Burrell at the wonderful minor league park in Reading, and the 20-20-20-20 thing offers a quick narrative-in-a-number describing the kind of player Rollins has become.

I would contend, however, that joining this quirky "club" should have no bearing at all on Rollins's ranking among players, including his candidacy for the MVP award. The 20-20-20-20 number is what I'll call a shape stat: it describes the shape of Rollins's production, the way his value manifested itself, rather than the amount of value he produced. Homers are worth more than triples, which are worth more than doubles, and all of those are much more valuable than stolen bases. 20 homers and 20 steals is not as valuable as, say, 27 homers and 2 steals; the former totals are just less common. For evaluative purposes, we need exactly the tools that most sportswriters gleefully ignore, the ones such as VORP that assign informed weights to each of these statistics and then sum up the player's performance.

For the purposes of description and narrative, I've got no problem with shape stats. But if we're talking about the MVP or other ways of ranking players, setting aside the shape stats is a good starting point for any serious discussion.

Wednesday, September 26, 2007

The low-payroll pennant race

I recently heard a friend claim that the Cubs don't deserve their underdog status because in spite of their history, they are today just another big-market behemoth stomping the true underdog: his beloved Brewers. That argument made me wonder whether you could argue that the current Brewers are doing about as well as a team can given Milwaukee's payroll.

I translated that curiosity into a testable question: do the Brewers have the best record of any team with their payroll or lower? According to Ben Fry's chart, the answer is no, but I found the stats interesting beyond the simple question, so I constructed the standings of the Brewers-or-lower-payroll teams:

Cleveland 93-63 $62M 0
Arizona 88-69 $52M 5
San Diego 86-71 $58M 7.5
Colorado 85-72 $54M 8.5
Milwaukee 81-76 $71M 12.5
Texas 74-84 $68M 20
Cincinnati 71-86 $69M 22.5
Washington 71-87 $37M 23
Kansas City 68-89 $68M 25.5
Pittsburgh 67-90 $39M 26.5
Florida 67-90 $31M 26.5
Tampa Bay 65-92 $24M 28.5

Obviously, Cleveland has had a remarkable year. Milwaukee is just above the TEX-CIN median of the twelve. The selection criteria exclude a handful of higher-salary teams doing worse than the Brewers, so Milwaukee looks better in the context of the whole MLB, but still only a little above average for their payroll. I'm surprised by how competitive the mid-level payroll teams are. Arizona and (in the truly low-payroll division) Washington stand out as teams producing a lot of value for the money. And to go back to an old hobbyhorse of mine, Tampa Bay must be a very, very profitable team given that puny payroll and revenue sharing, and that is a great shame.

A bottom line: if the season ended today, these 12 teams at the bottom of the payroll standings would produce three of the eight playoff squads. Not bad.

Tuesday, September 4, 2007

Generality, meet example

Today's guest on the Baseball Today podcast made this sensible and important remark:

"Teams are never as good as they appear to be on hot streaks and never as bad as they appear to be when things are going badly."

Right on, brother!

One minute later, he made the case that the Padres are the best team going in the National League because "over their last dozen games or so," they're averaging about six runs per game. And "if they're going to hit the ball, they're going to be good." "I think the Padres, right now, might be the best team in the National League."

I leave it to you, reader, to put those two moments together and see what happens.

Tuesday, August 21, 2007

The loss column (drive or fly?)

On the Baseball Today podcast, Peter Pascarelli frequently presents a team's position in the standings only in terms of loss column differential--as in, "The Dodgers are only five back in the loss column, so don't count them out."

The use of the loss column alone is always strange. Unlike the sensible games behind formula every newspaper uses, the loss column stat ignores what we usually think of as the most fundamental stat in baseball: team wins. At this point in the season, there's no reason at all to speak of the standings in terms of the loss column only, as the absurdity of saying a 60-20 team is tied with an 18-20 team easily demonstrates. It matters if you've already won a game.

I assume that the whole loss column business got started as a way to describe pennant races at the very ends of seasons. If a team is 1.5 games behind but tied in the loss column, that team still controls its own destiny, as the saying goes, and we humans love that sense of control. (The famous fact that almost everybody fears flying more than the much more dangerous activity of driving derives from the same illusion of control.)

Therefore, it makes sense that measuring races by the loss column at the end of the year became routine back when teams had to be good to compete for the playoffs. The grain of truth in the loss column idea would lie in the fact that really good teams are likely to win any given game, so if the Yankees are three back of the Red Sox with five to play, the loss column standings could have a real effect on the probability of a comeback. The closer the two teams' winning percentages are to 1.000, the more the loss column matters. The less glorious divisional races of the twentieth-century, in which the Cardinals could realistically make the playoffs with a losing record, make the conventional standings a much better measure of a race.

Thursday, August 16, 2007

The Optical Revolution in Baseball Analysis

This Slate article, brought to my notice by John Smick, has given me a sense of what I have missed by moving to the periphery of baseball analysis over the last five years or so. When I slipped from analytical wakefulness into a fog of half-awareness, my fellow contributors on the once-proud Usenet group had followed the lead of Bill James and worked out the basics of modern statheadedness: adjusting statistics for park, league and era; understanding the value of adjusted minor league numbers; separating players' performance from their teammates' influence; and so forth. Some of the writers would go on to create the Baseball Prospectus and other publications. After Moneyball popularized our long-held belief that baseball's conventional wisdom rested on flawed assumptions, a few of them even worked their way into jobs with major league teams.

We knew back then that we didn't understand how to evaluate fielding very well. Whereas pitching and hitting were largely individual efforts, the team effects of fielding make judging fielders like judging football or basketball players, whose performance always depends on the actions of other players. But I thought I had a sense of where the breakthrough would lie: the statistics needed to be based on technological observations. With cameras or lasers or something (I'm, um, not an engineer), we could know exactly how players react to batted balls--how fast the players break, how fast and accurately they track the balls, how often they catch balls when they reach them. Given the amount of money floating around baseball, I wondered why some team hadn't already introduced this technology in a proprietary way.

At the end of the Slate article, author Nate DiMeo hints at "next year's rumored Gameday innovation--video cameras that cover the playing field. These cameras promise to yield important insights about the art and science of fielding." Apparently, the dream becomes reality next year, but wonderfully, the data will be public, not the jealously guarded property of certain teams, as I had imagined.

The other factor I hadn't imagined lies in the main point of DiMeo's piece: the whole-field cameras follow the introduction of similar cameras that analyze pitching. In the combination and extension of these developments lies the root of a real revolution in sports analysis, one that I predict will ultimately mean more than and largely replace the Jamesian statistical revolution.

This is the optical revolution.

The Jamesian revolution largely involved the proper adjustment of individual performance for context. Often the conventional wisdom had undervalued context: we needed to adjust statistics for park and league and era to get past the idea that a homer is a homer is a homer. Sometimes we needed to value context less: the Jamesian revolution led us to see a less fundamental difference between major and minor league numbers and to regard RBIs as essentially a measure of slugging percentage muddied by context.

While the Jamesian revolution generally refined the analysis of existing statistical observations, the optical revolution will change the means of observation and the nature of the observed. The result of the optical revolution will be the separation of performance and results; we will shift from analyzing the statistical record of games to analyzing players' performance in a hypothetical, parallel world.

In this parallel world, we will attempt to build links between players' actions and results. The optical revolution will attempt to imagine a world of baseball without chance.

We already see hints of this development in pitching statistics that have grown out of Voros McCracken's thesis that Major League pitchers have control over walks, strikeouts, and home runs, while chance dictates the results of other batted balls. McCracken and others have used that insight to measure assign to pitchers hypothetical ERAs, adjusted for luck and team defense--these statistics attempt to measure what should have happened rather than what did. This process goes beyond norming the stats for context; it involves evaluation that deliberately excludes results. These numbers, for example, that Matt Morris's relatively effective pitching for the Giants during the first half of this season was largely the result of random variation. (More of his batted balls than usual found their way to fielders' gloves.) The Giants probably capitalized on that knowledge when they unloaded Morris into that repository of all other teams' failed projects and miscalculations, the roster of the Pirates.

The optical revolution will extend this line of thinking much farther. We have always known that some balls are called strikes and vice versa; now we know which ones, and we will soon be able to adjust for those mistakes. We will be able to say how much of an excellent pitching performance arose from the pitcher, how much from defense, how much from the opposition, and how much from luck. We will be able to say how much setting up an outside curveball with an inside fastball changes the effectiveness of the curve. If a great hitter has a lousy playoff series, we will be able to say whether he simply faced great pitching or whether he failed to hit pitches he normally smokes. This last example illustrates the transformation of the Jamesian revolution: the Jamesians cast doubt on the notion of the clutch hitter by looking at aggregate statistics, but the optical revolution will enable us to see exactly how performance changes (or remains constant) as the season becomes the postseason.

The Jamesian revolution asked us to take our eyes off the ball and look at the numbers. The optical revolution will use technological eyes to follow the ball in unimagined ways.

Wednesday, August 15, 2007

Fire Joe Morgan on Fire

Here is a column I wish I'd written on A-Rod. Hat tip: David Archer.

Tuesday, August 14, 2007

The knee of the Tiger

Update, July 2008: I suspect that this post does not describe Tiger's now-famous knee injury, but I want nonetheless to acknowledge that yes, I realize how dumb it sounds now. So noted.

During Tiger Woods's victorious final round at the PGA championship on Sunday, Woods stumbled awkwardly as he pumped his fist after holing a birdie putt on the eighth hole. I was watching the round with two friends, and none of us saw anything of concern in the stumble, but the CBS announcers immediately speculated that Tiger had incurred a knee injury. (Come to think of it, they may have said ankle first, but they soon settled on knee.) For many holes afterwards, they relentlessly attributed every bump in Tiger's road to victory to his allegedly hurt knee, in spite of no limps or grimaces to bear out the theory. When Woods clinched the victory with spectacularly huge swings, the knee narrative disappeared. I found this to be an unusually clear demonstration of selection bias, and one that illustrated more general problems with sports journalism: the announcers had more incentive to set up a dramatic narrative than to evaluate the evidence in front of them. With the narrative established, they supported it at every opportunity and then wordlessly abandoned it as it proved to be nonsense.

Tuesday, July 24, 2007

The Sports Guy on the Donaghy scandal

I've been meaning to say more about what makes Bill Simmons a terrific sportswriter, one I'm almost always eager to read, even though his analytical instincts drive me nuts. This column on the NBA refereeing scandal is Simmons at his best. This is a sports story that is all about fan psychology--the way everybody will talk about games next year, and nobody covers that angle better than Simmons.

Monday, July 16, 2007

How about the Sports Guy Race Theorists?

Back in March, Bill Simmons got himself in a little hot water by saying that this was an "astounding realit[y]" of the 2005-06 college basketball season: "Two white guys (Adam Morrison and J.J. Redick) were indisputably the two best college basketball players alive."

Perhaps Simmons thought that Boston sports fans have such a longstanding record of interracial harmony and good cheer that nobody would notice that you can't explain the logic underneath that statement without cringing. Hey, I've got a minute. Go ahead and explain to yourself why the joke is funny. I guarantee at least two cringes, or a cringe and a wince.

When readers called Simmons on the comment, he thoughtfully offered this olive branch: "For anyone who was offended, I'm sorry … not for the joke, but for the bug up your ass." Yes, it takes a lot of courage and integrity to go for the old "bug up your ass" line. Not to mention writing skill.

Bill Simmons is a skilled writer and often a skilled thinker, too, but his head seems to shrink when he tries to joke about race. Here is his reason number 929 why he loves sports:

The Utah Jazz

I will never get used to this: One of our most white-bread American cities roots for an NBA franchise named for a musical movement created by African-Americans. It's genuinely insane. You can brainstorm with your buddies all weekend to come up with a name for a sports franchise that makes less sense -- there's no way you're topping Utah Jazz. Not even with Dallas Indians.

Let's leave aside the lack of originality here--seriously, has anybody not heard this before?--and go to the hysteria of Simmons's resistance to the idea of jazz in Utah. Obviously, Utah Jazz an odd name, with the oddity stemming from the team's move from New Orleans to Salt Lake City. Probably nobody would have considered giving the name to a new franchise. But "genuinely insane"? There's no topping it, even with hypothetical names?

I find Simmons's adolescent excitement about an old joke revealing. Utah Jazz is an oxymoron only in the dull-witted logic of bad jokes, in which all Utahans are Mormons, all Mormons are white, and no white people play jazz. The fact that lots of people have played and do play jazz in Utah is a side point, though, compared to the revelation that in Simmons's imagination, a loose association between a broad style of music and a racial group has more force than anything else he can imagine. So I take up the challenge to think of team names nuttier than Utah Jazz. Simmons offers us a weekend, but I'll take five minutes:

Laramie Surf
Minneapolis Camels
Havana Barons
Miami Frost
New York Humility
Hartford Rebels
Cedar Rapids Mountaineers

It's not so hard--if you think about it.

Friday, June 1, 2007

Inauspicious: Pascarelli on Polanco

Last year and this, I have listened regularly to ESPN's Baseball Today podcast, hosted until this week by Alan Schwarz. I have enjoyed the podcast in part because Schwarz and his guests (especially Rob Neyer and Steve Phillips) did a good job of combining responsible statistical analysis with current news and anecdotes.

Schwarz has now departed to work for the New York Times, and Peter Pascarelli has taken over the hosting duties. Yesterday, an email asked him to discuss the best second basemen in the American league. After entertaining a couple of other possibilities, Pascarelli brought up his own choice. He began,

My favorite second baseman, though, in the American League is Placido Polanco, who I think is one of the most underrated players in baseball.

This surprised me: I've long thought of Polanco as an overrated player, a useful guy talked about as a star because he's versatile and makes contact well. But I haven't followed the story for a while, so I waited for more details. Pascarelli continued,

And those of you who like stats might be interested to know ...

At this point I literally stopped in my tracks and waited, suspecting that I was about to hear a customized statistic designed to carve out a little slice of Polanco's performance that makes him look especially good. Sure enough:

... that since 2005, Placido Polanco has the best average with runners in scoring position of any player in baseball, and that is something which I bet a lot of you didn't know.

Well, I certainly hope most people didn't know that. It's a misleading fact in three ways: 1) it's about batting average, a stat whose limitations favor Polanco; 2) it relies on one measure of clutch hitting--clutch hitting is an idea enormously susceptible to distortions and small-sample variations, and people who pick one measure are almost always doing so to slant their evidence; and 3) it arbitrarily chooses 2005 to the present as its time frame.

In fact, if you look at different time periods and a different measure of clutch hitting, Polanco will look much worse; he has stunk with men on and two out, for example.

My point is not to knock Polanco, who is a very good player. My point is to knock Pascarelli, who (like Bill Simmons, to relate this to my blog title) seems to confuse statistical expertise with the recitation of isolated and slanted numbers--which is pretty nearly the opposite of statistical expertise. Pascarelli closed his comment thus:

All of you who think of yourselves as experts, well, take a back seat to me, pal, I knew that number and you didn't.

He sounded like he was kidding. Sort of. I fear we've lost an excellent podcast.

Wednesday, May 30, 2007

Graphickry: Team Salary, Team Performance

Ben Fry has concocted an ingenious interactive graph showing the relationship between each Major League Baseball team's salary and won-lost record for any day of the season.

To criticize such a graph--certainly the most interesting and useful presentation of this data that I've seen--would smack of churlishness and ingratitude. Hey, that's my cue!

The problem with the graph is that it distorts the relative positions of the team's salaries and performance by presenting them in evenly spaced lists on the two sides of the graph. Fry seems to recognize the problem with this approach on the salary side and therefore represents each team's salary by its line thickness as well as its position on the right side of the graph. The result is two visual representations of team salary that contradict each other: the Yankees' position on the right side of the graph inaccurately presents the teams a one evenly-spaced slot above the Red Sox, whereas the thickness of the line accurately but unintuitively reveals the huge gap between the top two teams. Combined with the deceptively even spacing of the team records on the left side, this flaw creates line slopes that can get seriously out of whack: at some points, one win or loss creates a deceptively large change in the slope of a team line, and the Yankees and Red Sox should be more obviously in their own leagues at the top of the salary side.

I've done enough web programming to know that my ideal graph would be vastly harder to program than Fry's already-complex one, so I understand his decisions; I just hope someone will surmount the technical hurdles to make an even better version of this.

I would also add a point of substance: it seems to me that team salaries in this context should include the costs and benefits of the luxury tax system. Which means that the Yankees are doing still worse.

In other words, Fry's graph makes the Yankees' season (to date--let's be clear) look disastrous, but it's actually much, much worse than it looks.

Sunday, May 20, 2007

The Preakness: Odds and History

I was struck by the headline in The New York Times on the day of the Preakness Stakes: referring to Kentucky Derby winner Street Sense, it read, "Favored in Preakness by Odds, Less So by History."

The article, by Joe Drape, shows the opposite.

It opens with a reference to Street Sense's status as the 7-to-5 favorite in the Preakness. That means that the odds gave Street Sense just under a 42% chance of winning. Like other sports betting markets, that of horse races has a fantastic track record, so we can reasonably expect horses with 7-5 odds to win about 42% of the time.

The article then moves to this paragraph, quoting Street Sense trainer Carl Nafzger:

“What I know is that I have a 9-to-1 chance to win,” he said, referring to the number of starters in Saturday’s race. “And that’s a lot better than it was at Churchill Downs for the Derby when we were 19-1.”

Because I am a generous man and have a good dessert in my belly, I will assume that Nafzger likes to toy with reporters rather than that he is as silly as this comment. (By this logic, horse owners might as well toss any old entry into the Preakness; heck, but a bunny in the gate--it's a lot cheaper to feed than a horse, and it will still have the same chance of winning!) Even so, this is an inspired bit of nonsense. It combines obviously false egalitarianism with a classic confusion of probability and odds ("9-to-1 chance" rather than one in nine).

Drape makes more sense in the paragraph that follows:

History suggests that the odds are much better than that for Street Sense: 52 percent of Preakness winners were sent off as the post-time favorite, as Street Sense certainly will be. In the past 10 years, six Derby winners have won the mile-and-three-sixteenths race and headed to Belmont Park with a chance to sweep the Triple Crown.

And here the more interesting point arises: if pre-race favorites had won 52% of Preakness runs, and Derby winners have done even better in recent races, then Street Sense's 7-5 odds reveal a weaker favorite than one would expect, a horse given less of a chance to win by bettors than the performance of similar horses in the past would indicate.

"Favored in Preakness by Odds, Less So by History"? Nope--favored by history, a little less so by the odds.

Thursday, May 10, 2007

A blog is born

I started this blog after composing this little rant on May 10th, 2007, so I'll date this post to that day, call it a blog, and then try to explain myself.

I just realized something: Bill Simmons, the Sports Guy, is the anti-Bill James. Like James, Simmons is an excellent writer with a great knack for popularizing his ideas. I often enjoy Simmons's columns immensely; I am always reminding myself to cut blog subscriptions, and Simmons always makes the cuts. But Simmons's reasoning is TERRIBLE. He's just a dreadful analyst in many ways, foremost among them his constant use of misleading statistics combined with reflexively anti-intellectual bashing of stat geeks (like me, it should be noted). Perhaps the next blog in my life needs to be