Statistics via Sports
- Recommended prerequisites: calculus, probability, R coding; matrix algebra helps
- Taught as part of the Wharton Sports Analytics Summer Research Lab
Lectures:
Intro
- Linear algebra primer
- Probability primer
- Example of the research process
- Rethinking WAR for starting pitchers
- planned lecture
- live lecture
- Statistical models vs. mathematical models
- Rethinking WAR for starting pitchers
- planned lecture
- live lecture
- code
- data
- XGBoost pre-trained hyperparameters
Regression
- Simple linear regression
- predict batting average across seasons, pythagorean win percentage
- planned lecture
- live lecture
- code
- Multivariable linear regression
- NCAA basketball power ratings, NFL expected points
- planned lecture
- live lecture
- code
- NCAA mbb schedule data, NCAA mbb team data, and NFL expected points data
- HW: Value of a draft position
- Logistic regression
- putt success probability, Bradley-Terry power ratings
- planned lecture
- live lecture
- code
- NCAA mbb schedule data, NCAA mbb team data
- HW: Power score comparison
Shrinkage & Bayesianism
- Regularization and the bias-variance tradeoff
- The power of fake data (priors)
- predict end-of-season win percentage from mid-season win percentage
- planned lecture
- live lecture
- HW: Priors for in-season prediction of win percentage
- Empirical Bayes
- predict end-of-season batting average from mid-season batting average
- planned lecture
- live lecture
- code
- 2019 batting average data
- HW: Empirical Bayes player quality
- paper – In-season prediction of batting averages: a field test of empirical Bayes and Bayes methodologies
- Examples: Bayesian modeling in sports
- A high-level overview of Bayesian statistics
- Bayesball: A Bayesian hierarchical model for evaluating fielding in Major League Baseball
- How often does the best team win? A unified approach to understanding randomness in North American sport.
- More Bayesian sports papers:
Tree machine learning
- Our example: in-game NFL win probabilities
- Decision trees
- Random forests
- planned lecture
- live lecture
- paper: Arcing Classifiers (explores the bias-variance tradeoff for classifiers)
- paper: Making sense of random forest probabilities
- XGBoost
- planned lecture
- live lecture
- paper: XGBoost
- XGBoost win probability & fourth down code (instructions in
README.md
)
- Nonparametric bootstrapped uncertainty quantification on simulated data
- planned lecture
- live lecture
- codefiles 1, 2, 3, 4
- Uncertainty quantification in fourth down decision making
- planned lecture
- live lecture
- stay tuned for our paper coming soon!
- XGBoost win probability & fourth down code (instructions in
README.md
)
Future Lesson Ideas:
Clustering
- K-means clustering
- NBA player clustering
- Eigenvalues, diagonalization, SVD
- PCA, factor analysis
Other Fun Stuff
- Kelly betting
- NFL Draft chart
- Spatio-temporal modeling (Bornn, Cervone, etc.)
- Selection Bias
- Data visualization tutorial