Written by David Bressler.
Last week, there was a dip in productivity from college football fans everywhere. It was the early signing day period, in which fans took to social media and message boards to view updates, boast or complain about the incoming high school recruits that committed to their college football team.
Like Groundhog Day, I see the same commentary every year…
“Don’t get hung up on player’s star ratings. Remember players X, Y and Z? They were two star recruits and now they’re in the NFL!”
“Don’t get hung up on player’s star ratings. Remember players X, Y and Z? They were five star recruits and they never even sniffed the field when the played for us!”
How important are stars when predicting collegiate success? What about predicting whether or not someone reaches the NFL?
It’s difficult to assess collegiate success, primarily due to the level of competition. Five stars are playing against five stars, one stars are playing against one stars. But it was relatively straight-forward to assess how star rankings correlate with the success of reaching the NFL.
Every year during the NFL draft, I like to keep an eye out for players from low to mid-tier schools that get drafted. It always amazes me to see underrated players like Khalil Mack, a 2009 two star signee that committed to Buffalo (his only offer!), who has been to the Pro Bowl five times and was the Defensive Player of the Year in 2016.
It seems like every year, the same mid-tier colleges are developing NFL talent and getting players to the NFL. As a UCF alumni, I know that there are currently 16 active former UCF players in the NFL, which ranks 29th of all colleges. Even though, in the past ten years, UCF typically ranks 60th in terms of their recruiting ranking.
I was curious to see how well college football teams do in terms of getting players to the NFL in relation to what’s expected from them. Without even looking at the data, you already know that colleges like Alabama and Ohio State get more players in the NFL than colleges like Akron and Middle Tennessee State.
But what’s expected from Alabama? What’s expected from Middle Tennessee State? Are they doing more with less, less with more, or meeting expectations? That’s what I wanted to know.
So my #hypepothetis for this month – What colleges do more with less and less with more in terms of getting players to the NFL?
Note – if you don’t care about methodology and you just want to see the answer, feel free to skip this next section.
The first step was to create a database of all high school football recruits. From 2007 – 2015, there were 20.6K high school recruits that had at least one star and were “good” enough to have a profile on recruiting sites like Rivals.com and 247sports.com. (Note – I purposely did not include recruits after the 2015 commitment year, since most of those players, even the high caliber ones, are still in their junior or senior year). All player information was gathered from 247sports.com, which included the following:
- Player name
- City & state (or region)
- Evaluation score/rating
- Star count
- Committed college
The next step was to create a database of all NFL players. This was harder than it seems. I found websites that had archives of all NFL players, but it didn’t show the respective colleges they attended. We need the college attended to truly tie together the two databases – the high school recruit database and the NFL player database. The player’s name alone isn’t enough. I mentioned there were 23.5K players in the high school recruiting database. In that database, there are five Kevin Johnson’s. One of them made it to the NFL. The college acts as an additional key to tie together the players between the two databases.
As an alternative, I was able to create an NFL player database based on Wikipedia’s historical NFL Draft pages, since it contained each player’s college from 2007 – 2019. This obviously does not encompass all NFL players, so I had to tailor my #hypepothesis to – What colleges do more with less and less with more in terms of getting players drafted in the NFL?
By keying off the player’s name and college, we were able to merge the two databases together to contain both the recruiting information and NFL player information. If there was a match between the two databases (the person was drafted), the player received a “1” under the ‘Drafted’ column and a ‘0’ if there was no match (not drafted). Sample of the database below was used for our analysis, which allowed us to predict the likelihood of a player getting drafted, which we can later aggregate across all teams to compare a team’s actual draft rate versus their expected draft rate.
Important notes about this dataset:
- The dataset was transformed to determine the success of getting drafted, not reaching the NFL. In the sample above, Terrelle Pryor was not drafted, but did eventually reach the NFL.
- The dataset was merged together based on the player’s name and college. There were instances in which a player transfered to another college, which causes a mismatch between the two datasets. The high school recruiting profile contains the player’s original committed college, whereas the NFL draft data contains the player’s most recent college. In the sample above, Bryce Brown originally committed to Tennessee but later transferred to Kansas State. Even though he was drafted, Bryce Brown was deemed undrafted using this methodology.
- Because I have a wife, son, and hobbies (yes, hobbies outside of data), I did not research every player to see if they transferred to adjust the data. With that being said, there’s a minimal amount of occurrences of players transferring and getting drafted, and there’s no one college where it happens more frequently.
There are many different types of models out there in the world, based on what you’re trying to answer. We’re trying to predict the probability of someone getting drafted to the NFL, thus there are only two outcomes – whether or not a player was drafted. Remember in the database – ‘1’ is ‘Yes’ and ‘0’ is ‘No’).
We used a logistic regression model, which is used when there are one or more independent variables – in this case, we used the player’s evaluation rating, position, and state/region. The star count was not used in the model, since it’s synonymous with the evaluation rating.
A couple other popular examples of using a logistic regression are when banks determine whether or not a transaction was fraudulent and when email providers deem an email to be spam or not.
After we ran the model, we added the probability of every player getting drafted. Sample below:
It’s important to explore the data before we answer the #hypepothesis.
The summary table below shows that there were 20.6K players in the dataset.
- Of the 20.6K, 52% were three stars, 34% were two stars, 12% were four stars and one percent were one and five stars.
- The lowest rating a player received was 0.6983 and the highest is 1.0. The median and mean rating is 0.823.
- Five states make up 10,531 of the 20.6K players(51%): Texas (14%), Florida (14%), California (10%), Georgia (7%) and Ohio (6%).
- 1,337 of the 20.6K (6%) of the players got drafted.
- The rate of players getting drafted exponentially increases as their star rating increases.
- 1% of one stars get drafted, 2% of two stars, 6% of three starts, 19% of four stars and 49% of five stars
- The rate at which players get drafted increases as stars increase across all positions (except one star safeties, on a base of only 15 players, ie: not statistically significant).
- Interestingly, players labeled as “Athletes” have either the highest or second highest rate of getting drafted across all star counts.
After the model runs, we’re able to see how important each variable is when determining the probability of a player getting drafted. The model showed that the evaluation rating is by far the most important variable, followed by the player’s state/region then their position. This suggests that the evaluators that rate each player are very good at their job.
Graphic below shows the relationship between the evaluation rating (x-axis) and the draft probability (y-axis) for every player in the database. The color represents whether or not the player was drafted (orange = drafted, blue = not drafted). Similar to the star rating and draft rate relationship from earlier, the evaluation rating and draft probability also have an exponential relationship since the stars and ratings are synonymous. The top of the curve has the highest concentration of players orange (players drafted), which makes sense since their rating and draft probability are highest. Interactive chart also here.
I called out a few players that didn’t follow the exponential trend – outliers!
The dots above the curve consist of players in which their draft probability exceeds the typical trend of other players with a similar evaluation rating. These player’s draft probability over-indexes because of the other two factors in the model: the player’s position and state/region.
Three of the players highlighted “above the curve”, Will Dissly, Brock Osweiler and Alex Green, are from Montana, which has the highest draft rate of all states at 23% (on a base of 13 players). Christian Covington and Adam Gotsis are both from British Columbia, which has the 2nd highest draft rate of all states (20%, on a base of 10 players).
The model does a great job at predicting these potential NFL draftees, despite their low evaluation rating.
The points below the curve are players in which their draft probability is lower than the typical trend of other players with a similar evaluation rating. All of these players are from either Ontario or Idaho, which of 60 total players between the two regions, have never had anyone get drafted. The model successfully predicted that these players had a nearly 0% chance of getting drafted, even for the four star recruits.
This tells us that a player from Ontario or Idaho is not as good as a player elsewhere with a similar evaluation rating.
These outliers, albeit directionally tell a story, are not statistically significant given the small number of players that come from these regions.
Few additional callouts that weren’t highlighted in the graphic:
- Kenny Bigelow, a five star defensive tackle with a 0.99 rating from Delaware that committed to USC (Southern California) had the highest draft probability (63%) but did not get drafted.
- Christian Wilkins, a five star defensive tackle with a 0.99 rating from Connecticut that committed to Clemson, had the second highest draft probability (60%) and got drafted 13th overall (1st round) in the 2019 NFL draft.
- Khalil Mack, a two star linebacker with a 0.70 rating from Florida that committed to Buffalo, had the lowest draft probability (5%) of all NFL draftees.
Now that we understand how the model works and the relationships between the variables, it’s time for us to finally answer the question – What colleges do more with less and less with more in terms of getting players drafted in the NFL?
The chart below shows there’s a linear relationship (0.82 r^2, 0.90 correlation) between the average player’s draft probability for each college and that college’s actual draft rate. Teams well above the line have a higher actual draft rate compared to what’s expected from them based on the caliber of players they recruit. Alabama’s actual draft rate is 30% and their average player probability of getting drafted is 21%. Inversely, teams well below the line have a lower actual draft rate compared to what’s expected from them. Texas’ actual draft rate is 9% and their average player probability of getting drafted is 15%. This speaks bodes well with how well these two teams performed over the last 10 years; Alabama has won several National Championships and Texas has been underwhelming and performing well below what’s expected from them.
A better way to visually show which colleges do more with less and less with more is show the percentage difference between the actual draft rate and the draft probability (below).
Teams above the red dotted line (0%, no difference) are teams that are exceeding expectations in terms of their expected draft rate and their actual draft rate. Teams below the line are underperforming and not getting players into the NFL at the rate that they should be based on the caliber of players they recruit.
- Based on the caliber of players Stanford recruits, their predicted draft rate is 11%, but they’ve exceeded expectations by having an actual draft rate of 21%.
- UCF’s actual draft rate is 6.2% and their predicted draft rate is 4.5%. UCF ranks 19th in terms of getting players drafted versus what’s expected from them based on the players they recruit.
- The University of South Florida (USF), UCF’s rival, has the third worst ranking of all teams with an actual draft rate of 1.4% and their predicted draft rate of 6.1% (a percentage difference of -77%).
- Despite having the highest draft probability, USC (Southern California) ranks 11th in getting players drafted in the NFL.
- 30% of Alabama players get drafted, which is 42% higher than expected (21%)
- Memphis ranks 1st when it comes to “doing more with less.” Their predicted draft rate is 2.6%, but they’ve had 6% of their players get drafted.
- Miami, despite their on-field struggles over the years, still maintained a high draft rate – higher than expected.
To put the data to scale, instead of using percentages, below you can see the actual number of players drafted (X-axis) and the volume difference between actual and expected number of players drafted (Y-axis). This visual helps understand the actual volume impact of how well each team is performing, primarily for the bigger colleges.
- This visualization shows that Tennessee, Texas and Nebraska, three teams that have underperformed on the field in recent years, are also underperforming in terms of getting high caliber players into the NFL.
Thank you for taking the time to read this write-up! I really enjoyed writing about two things I’m passionate in – sports and analytics.
We plan on conducting an out-of-the-box type of analysis like this every couple months and sharing it on social media and in our newsletter. If you have a question that you’d like answer, let us know!