Winning NCAA Basketball Tournament Defense Analysis

Is the Best Offense Really a Good Defense:

Do Winning Teams in the NCAA Tournament Have Strong Defenses?

This #hypepothesis is dedicated to Karthik Sakthikanesan. Spite is a powerful motivator #fyk – Rachel

Introduction:

The adage, “the best offense is a good defense” has guided generations of application in the fields of molecular biology (Cramer, Kindeberg, Taylor 1999), public policy (Goodman 2006), criminal justice (Miller 2006), and military combat (Zajac 2003). In the realm of sports, an effective defense is essential to shut down the opponent’s scoring potential. Simply, if a team can prevent its opponent from scoring, the opponent won’t have enough points to win. In the fast-paced game of basketball, this can be crucial to success.

I’ve always favored typically “defensive players” (see: light of my life Jonathan Isaac) for their grit and determination required to beat out the other team on the boards. But sometimes I feel these players don’t get enough credit. The purpose of this #hypepothesis is to determine if a team’s defensive prowess influences their March Madness performance. Is the best offense truly a good defense?

Review of Literature:

Early literature on the analysis of team performance focused on psychological factors, including mood and depression on performance (Newby and Simpson 1994), stress on individual players’ free-throw shooting performance (Whitehead et al. 1996), and the effect of travel and rest on team performance (Deddens and Steenland 1997). The proliferation of data collection in years following allowed fans, coaches, and researchers to better analyze performance with more advanced statistical measures. As important as defense may be in basketball, it is often overlooked by fans for flashier factors such as three-pointers or dunks. In the relevant literature, many authors have explored the nature of offensive factors on team success.

In his 2010 paper, Raymond Witkos sought to determine if there were factors that were common between a team’s regular season and postseason success, and if those factors were predictive of a team’s ability to win the NCAA tournament. Witkos creates a stepwise regression model for each of the tournaments between the 2003-2004 and 2008-2009 seasons. He finds that of the thirteen independent variables used to evaluate success, five variables (Points per Game, Three Point Field Goal Percentage, Opponents’ Field Goal Percentage, Opponents’ Points per Game and Campus Support) were significant. Opponent’s Field Goal Percentage and Opponent’s Points per Game could be interpreted as a measure of defense, because they are points that the home team let up, but they are not a strong enough determinant alone of a team’s defensive efforts; the opposing team might have had a hot night and made every shot they took, irrespective of the home team’s defense.

Yuanhao Yang explores the connection between the statistics of individual players and the performance of their respective NBA teams in their regular season in his 2015 paper. Unlike Witkos, Yang averages the results of the past twenty seasons for his data set, which is the method adopted for this paper’s data set period of the past six years. Additionally, due to the percentages of statistical measures involved in his model, he uses a logarithmic regression. He uses both the Player Efficiency Rating (PER) – a measure of player performance by the variables of field goals attempted and made, personal fouls, assists, value of possession, rebounds, turnovers, and steals, all weighted by minutes played – and Team Efficiency Rating (TER) to determine a team’s win ratio (wins/total games played *100). He finds that teams with higher TER have higher win ratios than teams with lower TERs, which is to be expected, as better performing teams should win more often. He finds less of a correlation between PER and win ratio, explaining that individual players contribute to the overall team, they are not always optimally exploited by the team, or are not consistent throughout the season (Yang 2015, p. 19). While Yang is able to create a model to predict a team’s regular season success based off of efficiency ratings, he does not isolate which factors are important. It is to be expected that teams with more field goals made, or less turnovers, win more often. The question now is to analyze more specifically if certain defensive factors determine a team’s success.

M. Utka Ozman analyzed the Euroleague under a probit regression model to assess “the marginal contribution of producing one more unit of each game statistic than the opponent to the probability of winning” in different levels of competition: regular season, top-16, and playoff (Ozman 2016, p. 1). Ozman’s findings are important to assigning relevance to each statistic. The ten variables he studies to evaluate winning probability are free throw percentage, two and three point percentage, assists, steals, turnovers, offensive and defensive rebounds, blocked shots, and fouls committed to find that a team collecting one more statistic than the opponent will increase the probability to win by X%. A higher number of two and three point percentage, assists, steals, and rebounds compared to the opponent all increased the team’s probability to win, while more turnovers and fouls reduced that probability. The most interesting results for this study were Ozmen’s findings on defensive rebounding: the marginal contribution of an additional defensive rebound compared to the opponent was 6% in the regular season and top-16, but 28% in playoffs (p. 104). Ozmen concludes, “solid concentration on defensive rebounding pays off much more under tougher competition” (p. 104). This #hypepotheis aims to discover whether this relationship holds true in the American league.

Anthony J. Onwuegbuzie analyzed the factors that would determine an NBA team’s winning percentage in the article “Factors associated with success among NBA teams.” The model included winning percentage as the dependent variable and twenty skill statistics as the independent variables: three-point, field goal, and free-throw conversion percentage, average numbers of both offensive and defensive rebounds per game, number of total rebounds, average numbers of assists, steals, and blocks per game, points per game, and the opposing teams’ average three-point, field goal, and free-throw conversions, average numbers of both offensive and defensive rebounds per game, average number of total rebounds per game, average numbers of assists, steals, and blocks per game, and average number of points per game. His results of a correlation between offensive factors and the winning percentage are expected; “the finding that field goal percentage rate explained a very large proportion of the variance in success (i.e., 61.4%) highlights the importance of offensive efficiency” (2000, p. 5). However, his results on defensive factors are promising for this study. He finds that a team’s winning percentage decreased as the opposing team’s defensive rebound percentage increased. Defensive rebounding percentage is measured in his study and our #hypepothesis as the number of defensive rebounds attained divided by the sum of defensive rebounds and opponent’s offensive rebounds. This number allows researchers to answer the question, “out of all possible defensive rebounds each team could have gotten, how many did they actually get?” Onwuegbuzie’s results are the most promising for us because they suggest the importance of defensive rebounds to a team’s success; in games when the opposing team outperforms the home team in defensive rebounds, the home team won less frequently.

Data:

My #hypepothesis question is “is the Number of Wins (for the Men’s college teams with at least three appearances in the NCAA Tournament) influenced by the Defensive Rebound Percentage, Number of Steals, and Number of Blocks?”

Independent Variables

Below is the summary of the Main Variables

These variables were chosen because one can extrapolate that in the fast-paced and competitive atmosphere of the NCAA tournament, teams without a good defense are not as able to succeed as those with stronger defensive statistics. Rebounds, steals, and blocks are all necessary aspects of basketball to prevent the opponent from gaining leverage in the game. Steals and blocks are absolute values, but defensive rebound percentage is a measure of the rebounds one team acquired relative to the total possible defensive rebounds of both teams. I wanted to conduct an experiment that shows the overlooked aspects of the defense and see if the aforementioned variables are related to the success of these teams. The Number of Wins has been selected as the dependent variable to measure the importance of defense because wins are the ultimate measure of success.

*data from 2011 to 2016 for a total of 57 observations.

Methodology:

I examined defensive rebounds, blocks, and steals to determine if they influence teams’ wins in the NCAA March Madness tournament. The question is not to determine if these factors predict the winner of the tournament, but rather if these factors are characteristics of winning teams. The environment of the NCAA tournament was chosen for this study in order to avoid the dilemma of comparing each team’s toughness of schedule in the regular season. The teams invited to the tournament are frequently the same teams each year, and even invitations to new teams imply that those teams are of the same caliber.

I used a Lin-Log Model because I wanted to see if, as the independent variables change, would there be an absolute change in the number of wins of Men’s teams in the NCAA tournament. As a result, Y is left alone and the X’s will be generated with the natural log.

Y_i = β₁ + β₂lnX₁ + β₃lnX₂ + β₄lnX₃ + u_i

The logarithmic transformation is used in order to reduce heteroscedasticity and skewness (Gujarati & Porter 166). Heteroscedasticity is an unequal spread or variance in independent and dependent variables. This could stem from outliers or skewness.

In this model I ran the Linear Model first, then I took the Natural Log of the independent variables and regressed the model using the new natural log of the independent variables to the dependent variable. This model will provide us with a better evaluation than the Log or Log-Lin model as we’re comparing the relative change in X to the absolute change in Y. I used a Standard T-Test to check the significance of each coefficient. Using the following equation for each independent variable:

T = β/SEi

A significance test will be run to test the following:

Ho: B2 = B3 = B4 = 0
Ha: at least one of the slope coefficients does not equal zero

It is also necessary to run a multi-scedasticity test to make sure no two variables are highly correlated, thus making results invalid.

Ho: B2=B3=B4
Ha: at least one variable is not equal to another variable

The following section shows the results of these tests.

Results:

As previously stated, I chose to run this regression using the linear-log model to see if a percentage change in the independent X variables (defensive rebound percentage, number of steals, and number of blocks) would result in an absolute change in the dependent Y variable (number of wins). Upon generating the natural log of each independent variable and running the lin-log regression model, I found the following:

The first thing I noticed upon creating the variables was the 11 missing values generated by Stata. Inevitably, there were some teams in previous years who appeared in the NCAA tournament, but were eliminated in the first round of sixty-four teams. Because the natural log of zero is undefined, taking the natural log of the number of wins resulted in missing (undefined) variables for those teams who did not achieve a single win.

The regression results were similar to what would naturally be expected: overall, an increase in the number of the measured points of defense will result in more wins for the team.

B2 is equal to 0.628, meaning that a 1% increase in the number of rebounds performed by a team would result in an increase in that team’s number of historical wins by 0.628.

B3 is equal to 3.486, meaning that a 1% increase in the number of steals that a team executes would result in an increase in that team’s number of historical wins by 3.486.

B4 is equal to 3.399, meaning that a 1% increase in the number of blocks completed by a team would result in an increase in that team’s number of historical wins by 3.399.

The intercept (B1) equals -11.987, which would normally be interpreted as the number of wins when the number of rebounds, steals, and blocks is equal to zero. However, this value for B1 is not realistic because a team cannot actually have a negative number of wins (even though it feels like it sometimes), but rather only zero.

R2 is equal to 0.6965, meaning that roughly 70% of this model is explained by the number of rebounds, steals, and blocks executed by a team as defensive tactics. The adjusted R2 is equal to 0.6793 and could be used to compare this model to other models with different independent variables (for example, the log-linear model).

Next, I compared the standard deviation of the number of wins to the mean number of wins. The standard deviation rounds up to 5, and the mean number of wins also rounds down to 5. Therefore, after comparing these two numbers, we can conclude that this model likely won’t be predicting how successful a team will be in the NCAA (But I was never much of a gambler anyway). This conclusion does not negate or discredit the entire model, however. It is still useful to see which variables matter – rebounds, blocks, or steals.

After running the regression, we now know how a percentage change in the measured points of defense will affect the number of wins a team garners in the NCAA; but are these coefficients truly statistically significant? I utilized a standard T-test of [T=B Coefficient/Standard Error] to measure the significance of each coefficient. I used the following to reach the T Statistic of T = 2.0041:

Number of observations = 57

Degrees of freedom = 55

Significance = 0.05→T=2.0041

T = B2/SE 0.6287579/5.745451 = 0.10943578
- 0.11 is less than 2.0041 → NOT significant 🙁
T = B3/SE 3.486733/1.019591 = 3.41973693
- 3.42 is greater than 2.0041 → significant
T = B4/SE 3.39978/0.7757936 = 4.3823254
- 4.38 is greater than 2.0041→ significant

The variable for the number of rebounds is not statistically significant to the number of wins. However, I have chosen to keep this variable in this model because it is an important variable to consider in the research question of defense in basketball. Most basketball stakeholders would agree that rebounds are extremely important to a strong defense, so I did not feel it appropriate to exclude it from this model.

I then proceeded by running the following overall significance test:

Ho: B2 = B3 = B4 = 0
Ha: at least one of the slope coefficients does not equal zero

Stata produced the following results for the overall significance test:

Based on these results, we reject the null hypothesis that the slope coefficients for each variable are equal to zero. This decision leads us to conclude that there is indeed a statistically significant linear relationship between the number of wins and the three points of defense I’ve chosen to measure.

Next, I assessed the equality of the regression coefficients with the following hypotheses:

Ho: B2=B3=B4
Ha: at least one variable is not equal to another variable

Stata produced the following results for the equality of coefficients test:

Based on the above results, we fail to reject the null hypothesis that each of the slope coefficients are equal. At first glance, it may seem illogical that we cannot reject the hypothesis that the coefficients are all the same, since the values for each slope coefficient are clearly very different numbers. However, the regression output serves to confirm the validity of the results of this test. Because the 95% confidence intervals for the slope coefficients of each variable overlap, it would indeed be possible for the slope coefficients to be equal to each other. These specific points of overlap between all the variables are 1.843 – 4.955.

Next, I tested the correlation of variables:

I then proceeded to conduct the following test to check for multicollinearity:

Solve for Variance Inflation Factor (VIF) → 1/(1-R²) = 1/(1-0.6965) = 1/0.3035

VIF = 3.3

Because the VIF for our regression falls in between 1 and 5, we can say that the variables are moderately correlated. However, because it is less than 5, we cannot say that multicollinearity between the variables is a severe issue – rather, it is very minor.

Finally, I tested for the problem of heteroscedasticity in the model by running the following test in Stata:

Ho: no heteroscedasticity
Ha: the model has heteroscedasticity

Because the p-value is greater than 0.05, we fail to reject the null hypothesis. Therefore, we can conclude that this model does NOT suffer from heteroscedasticity. Nice.

Conclusion:

Based on the results from the regression and the various tests conducted afterwards, this model does a fairly decent job of explaining the effect of the defensive rebound percentage, steals, and blocks on the number of historical wins. 70% of the model was explained by the variables, and the overall regression was confirmed through testing to be significant. The insignificance of the number of rebounds does not make the entire model insignificant, but serves to help better explain specifically which of the various defensive tactics may help a team garner more wins in the NCAA.

References

Cramer, William A., Magdalen Lindeberg, and Ross Taylor. The best offense is a good defense. Nature Structural and Molecular Biology 6, 295 – 297 (1999) doi:10.1038/7520
Deddens, J.A., and K. Steenland. “Effect of Travel and Rest on Performance of Professional Basketball Players.” National Institute for Occupational Safety and Health 20.5 (1997): 366-369. Web. 28 Apr 2009. .
Goodman, Travis. The Best Offense Is a Good Defense. Indiana Epidemiology Newsletter July 2006. ISDH Food Protection Program Project. Targets Food Safety and Security
Gujarati, Damodar N., and Dawn C. Porter. Basic Econometrics 5, 166, Chapter 11 (2009) isbn: 978-007-127625-2
Miller, Colin, The Best Offense is a Good Defense: Why Criminal Defendants’ Nolo Contendere Pleas Should be Inadmissible Against Them When They Become Civil Plaintiffs. University of Cincinnati Law Review, Vol. 75, p. 725, 2006.
Newby, R.W., & Simpson, S. (1994). Basketball performance as a function of scores on profile of mood states. Perceptual and Motor Skills, 78, 1142.
Onwuegbuzie, Anthony J. “Factors Associated with Success Among NBA Teams.” Sport Journal 3.2 (2000): web. Web. 12 Oct 2009.
Utku Özmen, M. (2016). Marginal contribution of game statistics to probability of winning at different levels of competition in basketball: Evidence from the Euroleague. International Journal Of Sports Science & Coaching, 11(1), 98-107.
Whitehead, R., Butz, J. W., Kozar, B., & Vaughn, R. E. (1996). Stress and performance: An application of Gray’s three-factor arousal theory to basketball free-throw shooting. Journal of Sports Sciences, 14(5), 393-401. doi:10.1080/02640419608727726
Witkos, Raymond. (2010). Determining the Success of NCAA Basketball Teams through Team Characteristics.
Yang, Stanley. (2015). Predicting Regular Season Results of NBA Teams Based on Regression Analysis of Common Basketball Statistics.
Zajac Col. Daniel. L. (2003). The Best Offense is a Good Defense: Preemption, Its Ramifications for the Department of Justice. In Williamson Murray (Ed.), National Security Challenges for the 21st Century [E-reader version] (pp. 59-99).