clock menu more-arrow no yes

Filed under:

From Boom to Bust: Building a Predictive Quarterback Model

New, comments

The Daily Norseman offers up another installment of From Boom to Bust, this time with the help of a statistician.

Bridgewater in purple...doesn't that look good?
Bridgewater in purple...doesn't that look good?
Bruce Kluckhohn-USA TODAY Sports

This past off-season I took it upon myself to develop a metric for evaluating quarterback prospects for the NFL draft.  My goal was to create a metric that could ultimately help predict which draft-eligible quarterbacks would be most likely to succeed in the NFL by identifying which traits quarterback prospects had in common with successful NFL quarterbacks when they were coming out of college.  If you missed the series of articles the first time, you can get caught up by looking at my first work in identifying successful quarterbacks, then bust quarterbacks and finally a preliminary follow-up checking the accuracy of the model.  The short of it is, I researched pre-draft scouting reports of dozens of quarterback prospects from 1999-2010 and catalogued every quarterback trait listed on those reports to identify which quarterback traits were common among quarterbacks who would eventually succeed or bust in the NFL.  After identifying the most common traits, the metric began to take shape and I used it to evaluate the 2014 quarterback class.  In what should be viewed as an encouraging sign for the Minnesota Vikings, not only did my metric show that Teddy Bridgewater was the most likely prospect to succeed in the NFL, so did the study the Cleveland Browns spent over $100,000 to commission.  But then again, a whole slew of NFL draftniks and media scouts have been saying essentially the same thing about Bridgewater for nearly a year, bad pro day be damned.

In any case, I am not a statistician and therefore my abilities in creating and testing this metric is limited.  However, this community we've created here at the Daily Norseman is nothing short of incredible.  It just so happens, that one of our very own members, Brad Davis, is a cancer researcher with a PhD in Population Genetics and has a lot of experience in developing statistical methods for analyzing biological data to identify new treatment options.  As a Vikings fanatic to boot, he took it upon himself to use his highly advanced brain to take my quarterback traits data and run some of his statistical analysis on it to augment and improve the work I had done.  We've been working together behind the scenes over the past few months to re-analyze the scouting data I collected and develop some new models.

So, one of the first things Brad did was create a Heat Map that shows how each of the players we researched relates to each other based on the scoring metric characteristics.  Here is Brad's explanation (in italics for the remainder of the article) for the Heat Map graphic that follows:

On the left side of the plot you can see what we call a ‘dendrogram', it looks a bit like a family tree.  If you sum up the length of the horizontal lines that connect two players, that indicates their ‘distance' in terms of the measured variables.

Heat_map_medium

(click to embiggen)

Using the scoring metric provided by CCNorseman, we can see that Teddy Bridgewater and Alex Smith are pretty similar to each other, at least with respect to the qualities measured in this database.  In fact, they're more similar to each other than anyone else. You can see that by examining the lengths of the horizontal lines connecting the two players together.  The sum of the lengths of these lines describes how similar any two players are to each other. And AJ McCarron and Matt Leinart are more similar to each other than anyone else too.  And those four individuals are closer to each other than the remaining individuals.

So looking at that dendrogram we can see three distinct groupings or ‘clusters' of players.  The top group which comprises Teddy B, Alex Smith, AJ McCarron, Aaron Rodgers, and Philip Rivers.  A second group contains Brett Smith, Big Ben, Eli, Mark ‘Butt Fumble' Sanchez, Brady Quinn, Josh Freeman, Matt Stafford, Jason Campbell, Vince Young, and Cutler.  Now the last category (which we'll call the ‘bust' category because of who is in it), contains Blake Bortles, Tim Tebow, Johny Manziel, JP Losman, Derek Carr, Jimmy G, David Fales, and Zach Mettenberger.   Maybe it's unfair to call this the ‘bust' group, because really that's only a ‘bust' group because of Tim Tebow, JaMarcus Russel, and JP Losman.  But that's certainly not the company I would want to be in, if I was a QB.

The middle cluster looks like the maybe good, but conditional QBs.  Joe Flacco, Matt Ryan, Big Ben, Eli Manning, Stafford and Cutler I think fall into this category.  They're good quarterbacks, clearly franchise QBs (three Super Bowl winners in it), but I don't think many people would classify any of those guys as ‘elite' QBs.  Or maybe that's just my bias.

And then in the 3rd category we've got Aaron Rodgers, and Matt Leinart.  Maybe Leinart is better than we think he is?  Could this be an example of a QB not being properly groomed, or is this a reflection of the limitation of the data we have to work with?  I'll leave that for you to decide.

The next bit of analysis Brad did was to develop a predictive model utilizing the scouting traits.  My metric produces a trait score, which shows how much the quarterbacks have in common with successful and bust quarterbacks (the heat map is essentially just another way to show those same connections), but it doesn't relate that information to actual on-field performance in the NFL.  This is where the "Adjusted Net Yards per Attempt" statistic comes into play.   As many know by now, ANY/A is the QB efficiency statistic that most closely correlates to winning in the NFL, so it would make sense to use that in a predictive model of quarterback success and failure in the NFL.  As Brad explains:

I split the scouting and statistical data into predictors and responses.  All of the scouting data (things like Good Pocket Awareness, Smart and makes good decisions, etc) are qualified as ‘predictors'.  I altered the predictors a little bit so that the values for any given QB for any given score could be -1, 0, or 1.  This allows the model to determine the relative importance of each of these qualities on ANY/A without bias.  The only response variable I used was the ‘ANY/A' statistic.  I then used this data to perform a type of multiple regression analysis called ‘Elastic Net regression'. This allows me to build up a set of statistical relationships that connect the predictors to the response.  Next I used that model to test out how good our model is at predicting the very data we stuck into it.  Now in general, this is a very easy test.  It *should* do a very good job, because we're building and testing on the same data, and in this case it does.  The correlation between the predicted ANY/A and the observed ANY/A is nearly 0.9 (0.895), which is excellent.  Anyway, that is just a quick sanity check to make sure we can get reasonable numbers out of the model.

The next thing I did was to take another independent set of QBs and examine how our model is able to predict their ANYAs.  The Elastic Net regression model already does this when determining the best model to fit to the data, using a process called cross validation.  However, the people making the measurements differs between these two sets of QBs, so in a sense, even though they're observing and measuring the same qualities, different scouts would grade the same players differently and some scouts are much better at scouting accurately than others. This variability in scouts between players adds a level of variability that the Elastic Net regression model can't take account for.  Anyway, the end result was a correlation of about 0.3, which isn't great, but (to me) it highlights the variability of different scouting reports.

Next, I then fed into the model the observed predictors for the 2014 rookie QBs, and that gave us the following predictions for career ANY/A.

Player

Predicted ANY/A

Teddy Bridgewater

7.331918

Jimmy Garoppolo

6.762551

David Fales

6.761379

Aaron Murray

6.666708

Zach Mettenberger

6.340886

Brett Smith

6.161756

Blake Bortles

5.57622

Derek Carr

5.41606

Johnny Manziel

5.219901

AJ McCarron

5.212748

As you can see, Teddy Bridgewater has by far the highest predicted ANY/A of any of the incoming QBs.  Interestingly, Johnny Manziel has the 2nd worst predicted ANY/A.

There are a bunch of caveats about these numbers. These values don't necessarily reflect what we can expect out of any of these QBs in their first year; it's probably a better reflection of what we could expect from them in an average year.  Moreover, they don't necessarily even reflect the correct ranking of the QBs in their first year.  There are so many unmeasured externalities- quality of surrounding players, quality of coaching, situation in which those players are expected to play in, etc.  And as mentioned above, the variability in the quality of the scouting reports is an important factor too.

Still, the difference between Teddy Bridgewater and Jimmy G is approximately 0.6; which is the difference between Jimmy G and the 6th ranked player.  That is to say, according to the above metric, Teddy Bridgewater is about as much better than Jimmy Garoppolo is over Brett Smith.  Second, Johnny Manziel isn't anywhere CLOSE to being as good as Teddy Bridgewater is.

Brad ran some more analysis, and this time he tried to determine the relative weighting of each of the scouting traits I used in the metric as it relates to the predictive ANY/A model above.  In his own words:

This figure below shows the relative importance (weighting) of each of the measures in the QB Metrics data.  You will note that there are multiple entries for each measurement, but that's because there were two different scouting sources for each quarterback.  I wanted to maintain that variability in the data, because some scouting reports are going to be more valuable than others (and as it turns out, keeping that variability in there does a much better job of fitting the ANY/A numbers).  What I had been doing was taking each of the scores and just marking them as a 1 or a 0 to eliminate any perception of the importance of each trait.  That lets the statistical model determine the importance of each factor.  If I combine the different scouting reports into a single report by simply summing up all the 1s, my R^2 (or reliability) goes to about 85%, where as if I keep all the scouting reports separate then it's about 99.9%.  I think that speaks to the fact that some scouts are good at measuring some things, but not others.

Relative_importance_medium

(click to embiggen)

Not surprisingly, things like ‘Forces passes' is very, very bad to your ANY/A, while things like ‘Pocket Awareness', and ‘can read defenses' are very, very good for your ANY/A.  But there are some very surprising things too.  For example: having poor mechanics seems to be a good thing.  That makes me wonder if the presence of poor mechanics means that the coaches are getting out of the way of the QBs mechanics and just letting them play.  Correcting ‘mechanics' may be important in young players, but by the time they reach the end of college they've figured out how they want to throw.  It's also interesting that ‘can throw on the run' is a negative trait.  Maybe that's b/c if you run, you don't need to pass to move the sticks.  Or maybe it's just cause QBs who run don't do a good job of seeing the field or reading defenses.  Or they're just too prone to running and not taking what's there.  Lacking accuracy doesn't seem to be a huge plus or minus, depending on who scores it.  And that makes me wonder if it's situational.

Next, Brad goes back to the original predictive ANY/A model described above, but instead of using the Elastic Net Regression to build the predictor model, he's tries a different approach.

I also decided to analyze the data with a completely different approach called ‘Random Forests'.  I won't get into the details here, but suffice it to say that Random Forests are a subset of statistical models called ‘machine learning', or what at one point would've been called ‘artificial intelligence'. I repeated the same basic steps as with the Elastic Net regression and using exactly the same input data.  Similar to the Elastic Net regression, the correlation between the observed ANY/A and the ANY/A predicted by the elastic net model was 0.95, which is fantastic, but the correlation between the second set of QBs and their observed ANY/A was only 0.30 again.  I was hoping that the Random Forests approach would be a bit more forgiving, but it wasn't.  It's a bit disappointing, but not necessarily hugely surprising either.  As I hinted at above, I don't think that this is a failure of the model as much as it reflects the high degree of variability between scouts and scouting reports.  In a sense, scout A's observation of ‘Forces Passes' is measuring something different from scout B's observations because these are subjective observations rather than objective measurements.  If all the QBs where rated by the same scouts, then I have no doubt that the correlation from the second set of QBs would have been much higher.  In any event, here's the predicted ANY/As from the Random Forest approach:

Player

Predicted ANY/A

Teddy Bridgewater

5.878178

Jimmy Garoppolo

5.523566

David Fales

5.793405

Aaron Murray

5.731617

Zach Mettenberger

5.154550

Brett Smith

5.536347

Blake Bortles

5.575658

Derek Carr

5.738203

Johnny Manziel

5.172498

AJ McCarron

5.830073

These scores are quite a bit different than with the Elastic Net regression, although Teddy Bridgewater is still on top, and Johnny Manziel is still second from the bottom.  The other thing to note is that the range of predicted ANY/As is much more compressed than you would normally expect to see from a sample of 12 QBs.  I decided to plot the predicted ANY/A's against the observed ANY/As for the original data set used to build the model, and it clearly showed that the Random Forest model tends to reduce higher ANY/As and increase lower ANY/As.  So I accounted for this and then re-generated my predictions again

Player

Predicted ANY/A

Teddy Bridgewater

6.217656

Jimmy Garoppolo

5.540454

David Fales

6.055765

Aaron Murray

5.937770

Zach Mettenberger

4.835744

Brett Smith

5.564863

Blake Bortles

5.639933

Derek Carr

5.950347

Johnny Manziel

4.870020

AJ McCarron

6.125791

As Brad has already mentioned, we added a set of QBs to the analysis this time around: the average ones.   For this article, I collected another group of scouting data and career ANY/A statistics on nine more quarterbacks that wouldn't qualify as "successful" or "bust" based on my definitions from the earlier articles.  These quarterbacks weren't part of developing the original metric at all in the first place, and should be a good control group to test the accuracy and predictability of the models.  We took their data and ran it through both the metric and the new predictive ANY/A models (as described above) to test for accuracy, and here were some more results.

Name

Bust Score

Success Score

Metric Total

Metric Prediction

Marc Bulger

-13

26.5

13.5

Successful

Michael Vick

-4

3

-1

Average

Alex Smith

-5

25

20

Successful

Jason Campbell

-14

7

-7

Bust

Kyle Orton

-8.5

6

-2.5

Average

Derek Anderson

-18

2

-16

Bust

Ryan Fitzpatrick

-14.5

11

-3.5

Average

Chad Henne

-10

12.5

2.5

Average

Josh Freeman

-20

3.5

-16.5

Bust

In the table above, the "metric prediction" label was pulled from the previous Boom or Bust article linked above that outlines the cut-off ranges for each of the three types.  As you can see the metric prediction is pretty darn close in terms of figuring out who would be successful and who would bust.   Take Marc Bulger and Alex Smith, both predicted for NFL success for example.  Bulger had some very good years with the Rams, but then got injured and hurt his overall career numbers by playing less than 100% during his final few years as a starter.  Technically Bulger doesn't qualify as a "successful" quarterback based on my black and white definition, but I do think there is room for argument there, considering Bulger made a couple of Pro Bowls and had four years with a QB rating of 92 or above.  And Alex Smith is a quarterback that many had written off as a bust after his first few years in the league, but he has had a career renaissance recently and has turned into a pretty efficient quarterback.  But his career numbers are low because of those poor early years, so again, there could be room for argument there too.  In the last four years, Alex Smith has averaged a 91.2 quarterback rating, compared to averaging a paltry 57.6 QB rating during his first three years in the league.  Then consider Jason Campell, Derek Anderson and Josh Freeman.  Based on my generous definitions of a bust, none of them qualify due to their high number of starts and just barely good enough career ANY/A numbers.  But all three are arguably in the conversation for being busts anyway, as they've all had disappointing careers.  The rest of the names on the list are all who I would consider to be the quintessential average NFL quarterbacks: Michael Vick, Kyle Orton, Ryan Fitzpatrick and Chad Henne.  I don't know about you, but these metric numbers seem to be very close.

As Brad has already mentioned, he also took this new set of data and ran it through the ANY/A predictor models he built and explained above, but he explains a bit further:

When we tested our models on their original data, we get an excellent correlation of at least 0.9, but when we test it on an independent data set, our scores drop to 0.3.  So at best we can expect our predicted values to be correlated with 0.9 of their observed ANY/As, and at worst we would expect the correlation to be about 0.3, and my gut says that we can probably expect a correlation around 0.6, which isn't bad.  We can say that both of the predictor models (Elastic Net Regression vs Random Forests) show that Teddy Bridgewater is expected to be the best quarterback in the draft, but one model predicts his ANYA to be in the elite category, and the other puts him in the ‘very good', but not quite elite grouping.  I don't know anything about scouting myself, but looking at these numbers here, I feel like the elastic net regression is a bit overly optimistic and the random Forest model is a bit overly pessimistic.  If I had to guess at what I think Teddy Bridgewater's ANY/A is going to be, I would say around the 6.8 to 6.9 mark.  The positive things I read about Teddy Bridgewater I find more convincing than the negatives.  And similarly, the positive things I've read about Johnny Manziel I find less convincing than the negatives.  I don't put a lot of stock into things like ‘Johnny Manziel knows how to win'.  What does that even mean?

As Brad mentioned above, the accuracy of the model when using the same quarterbacks used to build the model is extremely high, but when used on an independent set of NFL QBs, it loses some accuracy.  Unfortunately, this study is limited due to the amount of freely available pre-draft scouting reports online.  For this study to improve in accuracy, it would need to be run with a more consistent approach with regards to scouting reports, and even then it would essentially be testing for accuracy of specific scouts.  But I'm encouraged that the model that Brad and I have created here, would appear to be about 60% accurate in predicting career ANY/A numbers (with a 90% accuracy score potential at the high end, and a 30% accuracy score potential at the low end).  At the end of the day, it is for you all to decide if this metric has any value, and of course, only time will tell!