Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Thursday, July 17, 2008

Why have the Rays gotten so good?

When I see that a baseball team has improved dramatically, I enjoy watching the popular narratives of the success develop. Usually, some combination of these four reasons comes up:

1. Charismatic leadership
2. General team chemistry (perhaps resulting from the departure of one or more bad apples)
3. Better players
4. Chance
5. Avoidance of major injuries

My sense is that dominant media narratives generally emphasize the first two factors, whereas statheads emphasize numbers three and four. Both sides are just beginning to catch up to the importance of injuries, an area that seems to capture everyone's interest these days. (Tom Verducci's recent article on Tim Lincecum captures the drama of biomechanical analysis beautifully.)

Incidentally, I see one of the most interesting questions in contemporary sports to be whether statheaded skepticism about charisma and chemistry in baseball should apply equally to sports such as basketball and football, where sustained concentration and teamwork matter so much more.

Anyway, to the Rays: listening to Chicago sports radio on a recent travel day, I heard four analysts in two forums describe the success of this year's Tampa Bay Rays almost entirely in terms of charisma and chemistry, with only an occasional mention of whether the team simply got better at playing baseball this year.

Given the fact that Nate Silver predicted in February that the Rays would experience precisely this kind of improvement (to the tune of 22 more wins, said Silver) and correctly pegged the key factor to be team defense, I have wondered whether any mainstream outlet would pick up the fact that Silver's model, which cares not a whit for chemistry, got this prediction so right. Granted, Silver wrote his article for an obscure regional rag called Sports Illustrated, but still, you'd think someone would pick up this story and run with it, right?

I lost all hope last week, when Silver's own organ, Baseball Prospectus, put out a podcast in which host Brad Wochomurka referred to the Rays' season that nobody saw coming. If the success of Silver's model hasn't convinced his own colleagues, I'll wager we have a long way to go before other outlets engage it.

Sunday, July 13, 2008

Baseball team stats and individual stats

In the course of making its funnies, Fire Joe Morgan applies statistics to baseball as well as just about anyone, but I think Ken Tremendous makes an interesting mistake in this takedown of the site's eponymous punching bag. Joe Morgan writes that the Red Sox are "the best team in the game," and KT replies, "The Cubs have a better team ERA and a better team OPS. For the record."

Leaving aside the question of whether a snapshot of team performance is a good way to call one team better than another, I want to focus on the use of ERA and OPS as measures of team performance. They are generally excellent measures of individual performance. The best quick justification of OPS, however, is that it basically explains where runs come from--team OPS correlates better with team runs than, say, batting average. And OPS scales well to the individual: one player's OPS gives you a pretty good sense of how much that player contributes per plate appearance to his team's scoring.

If we want to look at team performance, however, we can eliminate the middleman: it's all about the runs. Forget stats that correlate well with runs--use runs! And on the pitching side, we can drop the "earned" component of ERA, since the whole point of that is (however roughly) to separate individual from team performance. Again, use runs!

A recent Rob Neyer column about the Tampa Bay Rays touched on another case where statistics work fundamentally differently at the team and individual level. Neyer points out that the Rays have taken a huge step up in defensive efficiency this year:

[T]here's an incredibly simple statistic that tells us almost everything. Defensive Efficiency -- invented by Bill James in 1975 -- never really has caught on, which is sort of bizarre because it essentially answers a most basic question: "When a batted ball is put into play against a team, what percentage of the time does that team succeed in turning that ball into an out?"

In 2007, the Tampa Bay defense turned 66.2 percent of balls in play into outs. That figure was 30th best in the majors.

In 2008, the Tampa Bay defense has turned 72 percent of balls in play into outs. That figure is second best in the majors.


Defensive efficiency is like measures of individual fielding that attempt to discover a player's ability to convert balls hit near him into outs; at the individual level, such statistics are always beset by the difficulty of establishing the player's zone accurately. At the team level, aside from relatively minor park effects, the problem disappears: every team is responsible for the whole field. And for a single team playing in the same park, year-to-year comparisons become sublimely simple.

The moral of my story: sometimes the nuances of measuring individual performance cause us to overlook simple, powerful team statistics.

Wednesday, November 7, 2007

Component ERA is on the traditional side for this guy?

I'm going to wind my way to a comment on the new GM of the Pirates, Neal Huntington.

I inherited a rooting interest in two baseball teams, the Giants and the Pirates, which are the hometown teams of my father and mother, respectively. I've always pulled for those teams aside from my one rebellious stage, the Great Yankee Apostasy of the last 70s, about which the less said the better.

The GYA ended during Game Four of the 1979 World Series, a game six-year-old I watched with my dad from the very last row of the upper deck in Three Rivers Stadium. The Pirates lost the game, but the crowd swept me into a durable affection for the team, even after the days of Willie Stargell (my favorite player for the rest of childhood) had long faded away.

The Pirates began and sustained their ongoing, record-shattering streak of losing seasons by combining a small-payroll strategy with traditionalist management. That is, they spent limited resources on the commodities that the market of the time most overvalued. Hence the signings of Pat Meares, twice, Derek Bell, Charlie Hayes, and others of their ilk. During the 1990s, the main online discussion board for Pirates fans turned into a place where statheaded fans gathered to savage then-GM Cam Bonifay and to imagine what a more enlightened GM would do with a player like Aramis Ramirez. Here is a typical conversation from 1999 in which I participated.

The Pirates were extremely slow to give Ramirez a job, and they never appreciated him. In 2003, they ended the Cubs' famous string of terrible third basemen by handing them a 25-year-old Ramirez for Jose Hernandez and a couple of minor-league nobodies. Oh, and the Pirates threw in Kenny Lofton. Ramirez has been a well above-average hitter ever since. Though he did play oddly well for the Dodgers the following year, Hernandez would never again play 100 games in a season.

In those days, the Pirates drove statheads crazy by failing to appreciate basic statheaded principles: the value of plate discipline, the value of minor league performance, the importance of a player's age, and so forth. Things have never gotten much better.

But now, lookee here. The new GM says,

We are going to utilize several objective measures of player performance to evaluate and develop players. We'll rely on the more traditional objective evaluations: OPS (on base percentage plus slugging percentage) , WHIP (walks and hits per inning pitched), Runs Created, ERC (Component ERA), GB/FB (ground ball to fly ball ratio), K/9 (strikeouts per nine innings), K/BB (strikeouts to walks ratio), BB%, etc., but we'll also look to rely on some of the more recent variations: VORP (value over replacement player), Relative Performance, EqAve (equivalent average), EqOBP (equivalent on base percentage), EqSLG (equivalent slugging percentage), BIP% (balls put into play percentage), wOBA (weighted on base average), Range Factor, PMR (probabilistic model of range) and Zone Rating.

Zoikes! For the first time this century, I'm very interested in what the Pirates will do next.

Thursday, October 18, 2007

A guy who ought to know

ESPN the Magazine leads off an article in its issue of October 22 thusly:

Recent history says the team popping the bubbly this season won't be the best record. So is it all luck? The Mag's Buster Olney asks a guy who ought to know: Braves pitcher John Smoltz.

What amazes me about this and many similar formulations is that writers seem to go out of their way to say, in essence, "This is a question that fall squarely in the province of statistical analysis rather than observation"--in this case, to frame the question as one of the relationship between probability and uncertainty--and then tumble directly into personal anecdote.

In next month's magazine, I hope to see the same logic in the other direction:

What does it feel like to take the mound with your team's season on the line and tens of thousands of hostile fans burying you in boos? The Mag asks a guy who ought to know: Louisiana Tech Professor James J. Cochran.

Saturday, October 13, 2007

Sports Guy gets his stathead on

In response to a reader question about teams moving to a short rotation in the baseball playoffs, Bill Simmons writes,

SG: Graham, that's a fantastic question. I don't have an answer for you. The three-day rest thing only seems to work when you don't have another choice (like the Red Sox in 2004, for example). If it's a conscious decision, the results always seem to be brutal. But I have another question: Why is everyone always so confident that sinkerballers are better on three days rest? People just spout this out like it's a foregone conclusion -- oh, yeah, it's fine when Wang pitches on three days rest, he's a sinkerballer. It is? Who said? Do we have scientific proof that it's better for any pitcher (even someone with a specialty pitch like the sinkerball) to be more tired than less tired? I'm dying for them to tackle this on "MythBusters."

That's a great response from Simmons; I want only to add that this case offers an excellent demonstration of the power of the optical revolution in baseball analysis. We suddenly have the power to know exactly what happens to sinkerballers after three or four days of rest, to see whether any effect correlates with the amount of sink on pitches, and to do all of this with a direct measurement of the pitches' break instead of inferring that measurement from fly ball/ground ball ratios. Amazing.

Tuesday, September 4, 2007

Generality, meet example

Today's guest on the Baseball Today podcast made this sensible and important remark:

"Teams are never as good as they appear to be on hot streaks and never as bad as they appear to be when things are going badly."

Right on, brother!

One minute later, he made the case that the Padres are the best team going in the National League because "over their last dozen games or so," they're averaging about six runs per game. And "if they're going to hit the ball, they're going to be good." "I think the Padres, right now, might be the best team in the National League."

I leave it to you, reader, to put those two moments together and see what happens.

Tuesday, August 21, 2007

The loss column (drive or fly?)

On the Baseball Today podcast, Peter Pascarelli frequently presents a team's position in the standings only in terms of loss column differential--as in, "The Dodgers are only five back in the loss column, so don't count them out."

The use of the loss column alone is always strange. Unlike the sensible games behind formula every newspaper uses, the loss column stat ignores what we usually think of as the most fundamental stat in baseball: team wins. At this point in the season, there's no reason at all to speak of the standings in terms of the loss column only, as the absurdity of saying a 60-20 team is tied with an 18-20 team easily demonstrates. It matters if you've already won a game.

I assume that the whole loss column business got started as a way to describe pennant races at the very ends of seasons. If a team is 1.5 games behind but tied in the loss column, that team still controls its own destiny, as the saying goes, and we humans love that sense of control. (The famous fact that almost everybody fears flying more than the much more dangerous activity of driving derives from the same illusion of control.)

Therefore, it makes sense that measuring races by the loss column at the end of the year became routine back when teams had to be good to compete for the playoffs. The grain of truth in the loss column idea would lie in the fact that really good teams are likely to win any given game, so if the Yankees are three back of the Red Sox with five to play, the loss column standings could have a real effect on the probability of a comeback. The closer the two teams' winning percentages are to 1.000, the more the loss column matters. The less glorious divisional races of the twentieth-century, in which the Cardinals could realistically make the playoffs with a losing record, make the conventional standings a much better measure of a race.