FanPost

Statistics and Football: Do They Even Mix, and If So, How Can We Use Them?

Hello fellow Vikings fans! I hope everyone is having a good summer so far. Its been a while since I've written anything, as I've been pretty busy these last few months. With the draft in the rear view mirror, I thought now would be a good time to try and explain what I'm sure some of you feel is pretty crazy way of evaluating and explaining what we see on the field. It also will help explain why I really, really like certain players, and why it seems I'm irrationally harsh against certain players.

7251567_medium

via grfx.cstv.com

So how do I (and many others, just in other sports for the most part) evaluate players? Well, anyone who read the title will understand I use predominantly statistical analysis to evaluate football players. In fact, its almost a rule of mine to not watch film of the players to attempt an evaluation. In fact, I'll have already generalized a player and determined how I feel about them before I'll ever watch film of them. The reason for this is that I'm (relatively speaking) good at analysing players using statistics, while I'm pretty terrible at evaluating players through watching film. However, just because I put a much greater emphasis on statistics and measureable traits than I do on what I actually see on the field, it doesn't mean that I don't value film watching. Quite the opposite. The thing is though, I feel that not everyone who professes to watch film actually knows what they are looking for and can accurately judge a player. Don't get me wrong, people can and do accurately evaluate players using film study. As Vikings fans, we are beyond lucky to have had the priviledge to read articles written by Arif. He very clearly understands what he is talking about and knows how to properly evaluate players using film.

The thing about watching film is that it has the potential to be incredibly subjective. We see this year after year leading up to the draft as we get random quotes by scouts who wildly disagree about a prospect. Both scouts can't be right, and unless one scout put in much, much more work than the other scout, chances are they have watched identical tape on a player. Therein lies the problem with scouting players using film: two scouts can watch the exact same film and come up with completely opposite results. However, when using a statistical model to help guide your player evaluations, the process is different. Don't get me wrong, statistical analysis is definitely not perfect, and no one will get a 100% success rate. Well, except this guy. But even he isn't 100% right all the time using statistics. That being said, if you have a good model (that means you've put the work in to test it and ensure it has a reasonable success rate), you shouldn't be bothered if you end up being wrong about a player evaluation.

The reason for that is because if you attempt to solve a problem with the correct approach and arrive at an incorrect solution, you will still gain something from the attempt. In our example, we can use the failure to better our model and help us become more accurate. I actually stole that approach from my engineering professers. In engineering (especially data-heavy classes) ensuring that you have the right approach, i.e. making sure you collect all of your known data, identify which equations you need to solve the problem, and which unknown(s) in those equations need to be solved, you will more than likely end up with a correct solution. Sure, you could always make a math error, or perhaps fail to account for a condition in the problem, but if you take a methodical approach to the problem, more often then not you will arrive at a correct solution.

Another way of saying this is that when I apply a statistical model to identify whether I think a player will be good or not I am betting on the odds. If a players historical comparables are all terrible, the odds are very much stacked against that player. Sure, it is very possible that that player could still succeed, but the odds are against him. By merely playing the odds (if I have a good model) I will have more and better successes than failures. And if my model identifies a player as a potential hit, and he ends up missing, I would use that data to try and identify why my model was incorrect, thus improving my model over time.

I guess that is a round about way of saying that the process can often times be, especially in statistics, more important than the result. If you have a very flawed process, your results likely won't matter because they won't be better than a shot in the dark. Sure, you'll inevitably have your successes, but your results will likely be impossible to replicate consistently.

Enough Already! Some Examples Please

I'm sure most of you would like something a bit more actionable than what I just wrote, so lets get to it. Lets start developing our own model to evaluate players. And because this position has more relevance now than ever before in the current NFL landscape, lets evaluate WRs.

So Where Do We Begin?

The most important step is first to identify what is important and what isn't. With more variables than I'd care to list going into WR evaluation, we need a way to (relatively) quickly identify which variables are important. Sadly, unless we want to start running regressions left and right, we will just need to sort of dive right in. And to be clear, there is nothing wrong with this method. It may be a little hard to get a handle on at first, but as you start to gain a feel for what is significant and what isn't, we will be able to more quickly identify what is important and what isn't.

For WRs, we need a beginning. If this was a classroom, I'd ask for some suggestions, but seeing as it isn't, we'll just have to go for it. To start, lets look at college WRs who, during their college career, had more than 1000 yards receiving at least once. For the sake of brevity, I'm only going to go back to 2007, otherwise things would start go get out of hand. Why 2007? It gives us a solid pool of players who have defined themselves as players, without being too large and daunting to cull down. I should mention that I didn't include any players from the 2012 or 2013 seasons, as those players haven't really had a chance (especially the rookie class obviously) to show what kind of players they will be. The list is now culled down to "only" 139 players (I elminated doubles). A pretty good sample size for now. I'm not going to start posting raw data until we cull the list down, so I'll just give you the results for now. If anyone does want to go through the data, please feel free to ask for it and I will post it. Anyway, how did we do? We set an initial cut off of 1000 yds receiving. With that cutoff, we ended up with a sucess rate of 19.8%. Yeesh. Pretty terrible to be honest. We've got our work cut out for us.

What if we are looking at a players offensive stats wrong? Lets try and cut down some of the noise by controlling for a teams offense. I've used this stat around here before, but just in case, here is a refresher on Market Share and Dominator Rating or DR for short:

Market Share works for both yards and TDs. Essentially, you take a players year end receiving statistics and divide them by the teams overall year end passing statistics. So if a player has 1000 yards receiving, and the team passed for 4000 total yards that year, his market share would be 1000/4000, 25% or .25. Similarly, if he caught 10 TDs, and the team passed for 40 total TDs, his market share of TDs would be 10/40, 25% of .25. Its a very easy way to use 'raw' stats while controlling for a players offense.

Dominator rating is simply the combined average of the two market shares. If we take the player in the above example, his DR would be: (.25+.25)/2, or .25 again. If he had a 35% market share of yards and a 25% market share of TDs his DR would be (.35+.25)/2 or .3. Simple enough right? Lets move on!

Using our new-found metric to control for a teams offensive skill, lets re-rate players on their DR (I use DR because its simply easier to do one test than 2, and it is just as effective as the two separate stats).

If we take the list of draft picks from 2007-2011 and now filter by DR instead of just 1000 yards receiving, we definitely get different results. For our purposes, anyone with a DR of over .30 was considered a hit. So how did we do this time? Well, out of the 139 players from our original 1000 yard receiver list, 87 WRs passed the test, and of those 87, 22 were hits, giving us a hit percentage of just over 25%. Better, for sure, but 1/4 is still not all that good. Its ok though, we have improved the hit percentage, so we definitely made progress. Lets try our next variable.

Is The Age at When A WR Starts Seeing Success Relevant To His Long Term Success?

The question is pretty much self explanitory; is a younger player who breaks out better than an older player who breaks out? Lets test it! Our third variable will be breakout age; or the age at which a WR bested the .30 DR threshold. The results for this variable speak for themselves. For WRs who were 20 or younger when they brokeout, 64% of them acheived NFL success. That is a huge jump for our model! 25 players met the criteria for a BOA of 20 or less, and 16 ended up having success at the NFL. And the rest of the players, the ones who brokeout at 21+? An NFL success rate of only 7%. Only 6/84 WRs who brokeout at age 21+ had NFL success, and of those 6, 4 were 21. The not-so-subtle message being that if you are 22 years old or older when you breakout as a college WR, your odds of becoming successful in the NFL are essentially 0%. Anyway, we've used only 3 variables and have correctly identified 64% of successful NFL wide receivers using our model. We haven't even added in physical factors such as height, weight, speed, etc. I didn't add them in because I didn't feel I needed to but they would definitely help cull our list further.

To break things down even further, 100% of 18 yr old breakouts in our sample had NFL success. Other 18 yr old WRs who have had success but aren't in our sample include:

And thats it. He's the only other 18 yr old breakout in the NFL who has had success. The only other 18 yr old breakout (that I could find, older data is notoriously hard to come by) that hasn't had success yet is Aaron Dobson. Dobson was only a rookie last year though and could very well breakout in a big way. Also, in case you were wondering, there are THREE rookie WRs who brokeout at 18 yrs old. You can probably guess one of them fairly easily, maybe even the 2nd one, but I bet the 3rd one might stump many of you. Here are the 3:

  • Sammy Watkins
  • Jordan Matthews
  • Donte Moncrief

There was only 1 rookie WR who had a 19 yr old breakout, and that is Allen Robinson, the "other" WR for the Jaguars.
There are a few highly drafted rookies of interest who brokeout at age 20:

  • Marqise Lee
  • Brandin Cooks
  • Davante Adams
  • Mike Evans



Here are some notable rookies who brokeout at an older age:

  • Odell Beckham Jr., 21
  • Chris Boyd, 21
  • Paul Richardson, 21
  • Jared Abbrederis, 22

And all by himself is poor Kelvin Benjamin who never actually brokeout in college. He graduated school and was drafted in the first round without ever breaking the .30 DR threshold. Does that doom him to be a bust? Not necessarily, but you can be sure I am thankful he didn't end up in Minnesota. Carolina is definitely not playing the odds.

Anyway, this is what I am referring to when talking about using statistics to sway the odds in your favor. Should you root for the Vikings to draft a player who dominated college as a 20 year old or a 22 year old? I think our model we came up with clearly shows where the favorable play is. Again, I need to stress that this model is far from perfect. It is pretty good, having already beaten the NFL hit rate for WRs without even taking draft position into account, which is by far one of the most important indicators for NFL success, because if nothing else high draft picks will get plenty of chances.

That about wraps this post up, as there isn't much more to add to the subject for now. Hopefully everyone was able to stay awake reading this post, as statistics can be somewhat dry. I hope that you enjoyed this and got something out it. Feel free to leave comments or questions about specific players, or the article in general. Thanks for reading everyone. Skol!

This FanPost was created by a registered user of The Daily Norseman, and does not necessarily reflect the views of the staff of the site. However, since this is a community, that view is no less important.