When the Giants Come to Town...: Thoughts on Player Value Part I

Tuesday, September 9, 2014

Thoughts on Player Value Part I

A couple of days ago, I posted Gregor Blanco's stat lines for the past 3 seasons including a value called Wins Above Replacement(WAR). Using WAR as a measure of overall value as a player, which is what the inventors and proponents of WAR purport it to tell you, Blanco is approximately the 25'th most valuable outfielder over the past 3 seasons in all of MLB. Not everybody agrees with that assessment. There are plenty of Giants fans out there who would be very happy if Gregor Blanco is not on the Giants roster next season. I like Gregor Blanco as a player and believe he is generally very underrated, but I am also not sure he is the 25'th best OF in baseball over the past 3 seasons. Despite the assets he brings to the table, somehow the "smell test" just does not agree that Gregor Blanco is the 25'th best OF in baseball!

Coincidently, the past few days have seen a flurry of articles on the national level featuring the same dilemma regarding players who rank high enough in WAR to be possibly considered for an MVP award. This year's poster children for WAR are Alex Gordon, OF, KC Royals, who has the second highest in MLB and Jonathan Lucroy, Catcher, Brewers who has the highest in the NL. Now, it's fairly easy to believe that last year Mike Trout was more valuable the Miguel Cabrera, but Alex Gordon? I mean, Gordon is a good player and everything, but the second most valuable in baseball? Again, the "smell test" is giving off a funny odor here!

To try to sort all this out, let's backtrack to some basics of how statistical data is handled. First of all, let's separate data from calculations. Statistics are essentially the compilation and analysis of raw data. Raw data is comprised of observable events. Those events happened. Hundreds or even millions of observers can count the number of times Hunter Pence touched home plate and scored a run in 2014. They will end up with the same number each and every time. That number is an observable fact! The same goes for the number of Home Runs, RBI's, Stolen Bases, Hits and Walks that are all primary data points. While these primary data points are virtually 100% reliable, the information they tell you about the player is limited. You can count the number of times Hunter Pence got a hit and you can count how many Plate Appearances he had, but if you want to know the RATE at which he got hits, you have to make a mathematical calculation. It gets even more complicated if you want to know to what extent the number of times Hunter Pence touched home plate and scored a run contributed to the success of his team, because unless the run scored as a result of a home run, Hunter Pence did not score that run all by himself. A teammate(s) had to help him.

As soon as you make a calculation, no matter how simple, you introduce built in error into your data. If a player gets more than 100 hits and has more than 100 AB's, a Batting Average calculated by dividing one into another is highly reliable out to 3 digits. If a player gets 3 hits in 7 AB's, your dividend has just one significant digit, so you cannot accurately say his batting average is .4285 or even .429. You can only say it is .4.

Another way uncertainty or error is introduced into data analysis is through the choices we make in what measurements to use in our calculation. For instance, why do we use AB's as the denominator for BA and PA's for OBP? Yes, it makes intuitive sense, but it is still a choice we make that makes a big difference in the outcome of the number. Slugging Percentage is even more problematical, because it introduces assigned valuation. The player got X number of singles, doubles, triples and home runs, but each of those events is then arbitrarily assigned a value. Do we really know that a Home Run is 4 times as valuable as a single? Well, that does make intuitive sense. It passes the "smell test" if you will. Do we know that a double is twice as valuable as a single? This one is a bit less obvious. Is a triple 1.5 times as valuable as a double? Now I'm starting to get a headache! How about a home run being 1.33 times as valuable as a triple? Think about how many times you've seen a batter hit a leadoff triple and get stranded at 3B vs the 100% certainty that a HR will produce a run and the idea that a HR is only 1.33 times as valuable as a triple starts to not make so much sense. Yet, SLG% has become a major cornerstone in player valuation by most baseball statistical analysts.

Next up: Defensive Statistics and Player Value.

17 comments:

AnonymousSeptember 9, 2014 at 9:53 AM
I am 100% sure Gregor is not a top 25 OF
ReplyDelete
Replies
DocGrooSeptember 9, 2014 at 12:49 PM
Dr B and fans--

I'm all for a rating system to rank the top X number of outfielders and such. Perhaps the Sabermetric community has loosely embraced WAR for this purpose, but what other statistics jump out at you as the top 5 or 10 in your mind? Perhaps the smell test is a global assessment of all those statistics when they are put together?

ReplyDelete
Replies
AnonymousSeptember 9, 2014 at 12:53 PM
Hunter Pence must be on another planet of a higher order then Gregor.
Not to say he is not valuable, as he is for sure.
Pence to me is off the scales.

Richard in Winnipeg
ReplyDelete
Replies
RogerSeptember 9, 2014 at 12:58 PM
DrB, I happen to like Blanco but I am not too thrilled about what we are paying him. This year he made a little of $2.5MM with a raise coming in arbitration next year. That could put him pretty close to $4MM. I decided to do a little research on what bench players earn. According to FanGraphs the Giants bench this year has been about $5.415MM. With arbitration raises for Blanco alone that could push our bench to close to $7MM plus raises to other bench players. To me that sounds like a lot of money to allocate to the bench. Do you think we should keep paying him or try to get value out of Brown, Parker, or some free agent or trade? Does he do enough to warrant his salary?

Here is the link I used:
http://www.fangraphs.com/blogs/2014-payroll-allocation-by-position/
ReplyDelete
Replies
Monterey SharkSeptember 9, 2014 at 1:28 PM
"Torture numbers, and they'll confess to anything." ~Gregg Easterbrook

"Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital." ~Aaron Levenstein

"Then there is the man who drowned crossing a stream with an average depth of six inches." ~W.I.E. Gates

"We play sport for passion, not statistics" ~ M. Shark
ReplyDelete
Replies
campanariSeptember 9, 2014 at 3:06 PM
I prefer, in sport as in most of the rest of life, to use my brain as well as my libido, and have to confess to some respectful bemusement when I come across those who don't. The response to any disconnect between Blanco WAR and Blanco aroma might be to look at more stats, for instance his rWAR this year of 0.7 rather than his fWAR of 1.8, and to see if splitting the difference appeals to one's nose. One can find a historically accurate weighting of different baseball events in *The Book,* so that one can estimate how good an approximation SLG might be. Or one can read with real and respectful interest various inquiries into evaluating players, such as DrB is starting to offer here. Nobody should scorn, I think, those unintellectual fans who prefer to hunker down and watch games, and maybe graze the beats; but I, at least, am surprised that they would want to haunt a thoughtfully analytic site like this one. That said, there's a good bit of debate among baseball aficionados about which stats are good for what, and to take part in those debates is to learn a great deal about the game we love, and the components of that game to which one might want to pay attention.
ReplyDelete
Replies
AnonymousSeptember 10, 2014 at 10:05 AM
You have it backwards doc. Humans are error prone, it is highly likely that there will be a spread of estimates of Pence's home plate touches centered around the true number of times Pence touched home plate. The more difficult the observation, and the more ill defined that which is being observed is made, the higher will be the spread in observations. Remember this when it comes to UZR and how ball flights and landing spots are characterized.

"While these primary data points are virtually 100% reliable, the information they tell you about the player is limited."

As we have previously noted, the data isn't 100% reliable, but the more important point is that the data may not be capturing the salient attributes that fully describe the value of a player.

"As soon as you make a calculation, no matter how simple, you introduce built in error into your data. If a player gets more than 100 hits and has more than 100 AB's, a Batting Average calculated by dividing one into another is highly reliable out to 3 digits. If a player gets 3 hits in 7 AB's, your dividend has just one significant digit, so you cannot accurately say his batting average is .4285 or even .429. You can only say it is .4"

False. Within the limits of the error in the observation (imagine an official scorer giving generous ruling for a hit, for instance), were dealing with rational numbers, the ratios are exact and embody an infinite number of significant digits - or as far as you'd like to carry out the calculation, IOW.

It is correct to say that 100+ observation sample is more reliable, but that has nothing to do with the act of calculation, and everything to do with with the errors introduced at the observation stage - presumably due to the official scorer's judgement in the case of determining what is, or isn't, a hit.

Peter
ReplyDelete
Replies

Add comment

When the Giants Come to Town...

Tuesday, September 9, 2014

Thoughts on Player Value Part I

17 comments:

My Favorite Sites

About Me

Search This Blog

Blog Archive