Written by David Bressler.
Last week, there was a dip in productivity from college football fans everywhere. It was the early signing day period, in which fans took to social media and message boards to view updates, boast or complain about the incoming high school recruits that committed to their college football team.
Like Groundhog Day, I see the same commentary every year…
“Don’t get hung up on player’s star ratings. Remember players X, Y and Z? They were two star recruits and now they’re in the NFL!”
“Don’t get hung up on player’s star ratings. Remember players X, Y and Z? They were five star recruits and they never even sniffed the field when the played for us!”
How important are stars when predicting collegiate success? What about predicting whether or not someone reaches the NFL?
It’s difficult to assess collegiate success, primarily due to the level of competition. Five stars are playing against five stars, one stars are playing against one stars. But it was relatively straight-forward to assess how star rankings correlate with the success of reaching the NFL.
Every year during the NFL draft, I like to keep an eye out for players from low to mid-tier schools that get drafted. It always amazes me to see underrated players like Khalil Mack, a 2009 two star signee that committed to Buffalo (his only offer!), who has been to the Pro Bowl five times and was the Defensive Player of the Year in 2016.
It seems like every year, the same mid-tier colleges are developing NFL talent and getting players to the NFL. As a UCF alumni, I know that there are currently 16 active former UCF players in the NFL, which ranks 29th of all colleges. Even though, in the past ten years, UCF typically ranks 60th in terms of their recruiting ranking.
I was curious to see how well college football teams do in terms of getting players to the NFL in relation to what’s expected from them. Without even looking at the data, you already know that colleges like Alabama and Ohio State get more players in the NFL than colleges like Akron and Middle Tennessee State.
But what’s expected from Alabama? What’s expected from Middle Tennessee State? Are they doing more with less, less with more, or meeting expectations? That’s what I wanted to know.
So my #hypepothetis for this month – What colleges do more with less and less with more in terms of getting players to the NFL?
Note – if you don’t care about methodology and you just want to see the answer, feel free to skip this next section.
The first step was to create a database of all high school football recruits. From 2007 – 2015, there were 20.6K high school recruits that had at least one star and were “good” enough to have a profile on recruiting sites like Rivals.com and 247sports.com. (Note – I purposely did not include recruits after the 2015 commitment year, since most of those players, even the high caliber ones, are still in their junior or senior year). All player information was gathered from 247sports.com, which included the following:
The next step was to create a database of all NFL players. This was harder than it seems. I found websites that had archives of all NFL players, but it didn’t show the respective colleges they attended. We need the college attended to truly tie together the two databases – the high school recruit database and the NFL player database. The player’s name alone isn’t enough. I mentioned there were 23.5K players in the high school recruiting database. In that database, there are five Kevin Johnson’s. One of them made it to the NFL. The college acts as an additional key to tie together the players between the two databases.
As an alternative, I was able to create an NFL player database based on Wikipedia’s historical NFL Draft pages, since it contained each player’s college from 2007 – 2019. This obviously does not encompass all NFL players, so I had to tailor my #hypepothesis to – What colleges do more with less and less with more in terms of getting players drafted in the NFL?
By keying off the player’s name and college, we were able to merge the two databases together to contain both the recruiting information and NFL player information. If there was a match between the two databases (the person was drafted), the player received a “1” under the ‘Drafted’ column and a ‘0’ if there was no match (not drafted). Sample of the database below was used for our analysis, which allowed us to predict the likelihood of a player getting drafted, which we can later aggregate across all teams to compare a team’s actual draft rate versus their expected draft rate.
Important notes about this dataset:
There are many different types of models out there in the world, based on what you’re trying to answer. We’re trying to predict the probability of someone getting drafted to the NFL, thus there are only two outcomes – whether or not a player was drafted. Remember in the database – ‘1’ is ‘Yes’ and ‘0’ is ‘No’).
We used a logistic regression model, which is used when there are one or more independent variables – in this case, we used the player’s evaluation rating, position, and state/region. The star count was not used in the model, since it’s synonymous with the evaluation rating.
A couple other popular examples of using a logistic regression are when banks determine whether or not a transaction was fraudulent and when email providers deem an email to be spam or not.
After we ran the model, we added the probability of every player getting drafted. Sample below:
It’s important to explore the data before we answer the #hypepothesis.
The summary table below shows that there were 20.6K players in the dataset.
After the model runs, we’re able to see how important each variable is when determining the probability of a player getting drafted. The model showed that the evaluation rating is by far the most important variable, followed by the player’s state/region then their position. This suggests that the evaluators that rate each player are very good at their job.
Graphic below shows the relationship between the evaluation rating (x-axis) and the draft probability (y-axis) for every player in the database. The color represents whether or not the player was drafted (orange = drafted, blue = not drafted). Similar to the star rating and draft rate relationship from earlier, the evaluation rating and draft probability also have an exponential relationship since the stars and ratings are synonymous. The top of the curve has the highest concentration of players orange (players drafted), which makes sense since their rating and draft probability are highest. Interactive chart also here.
I called out a few players that didn’t follow the exponential trend – outliers!
The dots above the curve consist of players in which their draft probability exceeds the typical trend of other players with a similar evaluation rating. These player’s draft probability over-indexes because of the other two factors in the model: the player’s position and state/region.
Three of the players highlighted “above the curve”, Will Dissly, Brock Osweiler and Alex Green, are from Montana, which has the highest draft rate of all states at 23% (on a base of 13 players). Christian Covington and Adam Gotsis are both from British Columbia, which has the 2nd highest draft rate of all states (20%, on a base of 10 players).
The model does a great job at predicting these potential NFL draftees, despite their low evaluation rating.
The points below the curve are players in which their draft probability is lower than the typical trend of other players with a similar evaluation rating. All of these players are from either Ontario or Idaho, which of 60 total players between the two regions, have never had anyone get drafted. The model successfully predicted that these players had a nearly 0% chance of getting drafted, even for the four star recruits.
This tells us that a player from Ontario or Idaho is not as good as a player elsewhere with a similar evaluation rating.
These outliers, albeit directionally tell a story, are not statistically significant given the small number of players that come from these regions.
Few additional callouts that weren’t highlighted in the graphic:
Now that we understand how the model works and the relationships between the variables, it’s time for us to finally answer the question – What colleges do more with less and less with more in terms of getting players drafted in the NFL?
The chart below shows there’s a linear relationship (0.82 r^2, 0.90 correlation) between the average player’s draft probability for each college and that college’s actual draft rate. Teams well above the line have a higher actual draft rate compared to what’s expected from them based on the caliber of players they recruit. Alabama’s actual draft rate is 30% and their average player probability of getting drafted is 21%. Inversely, teams well below the line have a lower actual draft rate compared to what’s expected from them. Texas’ actual draft rate is 9% and their average player probability of getting drafted is 15%. This speaks bodes well with how well these two teams performed over the last 10 years; Alabama has won several National Championships and Texas has been underwhelming and performing well below what’s expected from them.
A better way to visually show which colleges do more with less and less with more is show the percentage difference between the actual draft rate and the draft probability (below).
Teams above the red dotted line (0%, no difference) are teams that are exceeding expectations in terms of their expected draft rate and their actual draft rate. Teams below the line are underperforming and not getting players into the NFL at the rate that they should be based on the caliber of players they recruit.
To put the data to scale, instead of using percentages, below you can see the actual number of players drafted (X-axis) and the volume difference between actual and expected number of players drafted (Y-axis). This visual helps understand the actual volume impact of how well each team is performing, primarily for the bigger colleges.
Thank you for taking the time to read this write-up! I really enjoyed writing about two things I’m passionate in – sports and analytics.
We plan on conducting an out-of-the-box type of analysis like this every couple months and sharing it on social media and in our newsletter. If you have a question that you’d like answer, let us know!