Statistics via Sports

Statistics via Sports Recommended prerequisites: calculus, probability, R coding; matrix algebra helps Taught as part of the Wharton Sports Analytics Summer Research Lab Lectures: Intro Linear algebra primer planned lecture live lecture Probability primer planned lecture live lecture Example of the research process Rethinking WAR for starting pitchers planned lecture live lecture Statistical models vs. mathematical models Rethinking WAR for starting pitchers planned lecture live lecture code data XGBoost pre-trained hyperparameters Regression Simple linear regression predict batting average across seasons, pythagorean win percentage planned lecture live lecture code Multivariable linear regression NCAA basketball power ratings, NFL expected points planned lecture live lecture code NCAA mbb schedule data, NCAA mbb team data, and NFL expected points data HW: Value of a draft position Logistic regression putt success probability, Bradley-Terry power ratings planned lecture live lecture code NCAA mbb schedule data, NCAA mbb team data HW: Power score comparison Shrinkage & Bayesianism Regularization and the bias-variance tradeoff MLB park effects planned lecture live lecture code MLB half-inning data HW: Adjusted plus-minus and RAPM The power of fake data (priors) predict end-of-season win percentage from mid-season win percentage planned lecture live lecture HW: Priors for in-season prediction of win percentage Empirical Bayes predict end-of-season batting average from mid-season batting average planned lecture live lecture code 2019 batting average data HW: Empirical Bayes player quality paper – In-season prediction of batting averages: a field test of empirical Bayes and Bayes methodologies Examples: Bayesian modeling in sports A high-level overview of Bayesian statistics Bayesball: A Bayesian hierarchical model for evaluating fielding in Major League Baseball Shane’s slides paper How often does the best team win?

Statistics via Sports: Summer Lab 2024 Daily format 1 to 2 hour lecture Hands-on active learning lab where you will analyze a real-world sports dataset Regression modeling Simple linear regression planned lecture lab, data: MLB pythagorean win percentage, MLB team payroll Multivariable linear regression planned lecture note: estimating the coefficients lab, data: NBA team-seasons for the four factors, punts Example of the research process planned lecture lab: get into groups and start thinking about a research project plan to read relevant literature and start with a replication of existing analysis finish up the previous labs Logistic regression planned lecture note: logistic regression & gradient descent lab, data: field goals, 2023-2024 NCAA men’s basketball game results and team info from Kaggle Confounding planned lecture lab, data: MLB half-innings data for park effects Models do what they’re told planned lecture lab, data: NFL expected points Frequentist statistical inference and uncertainty quantification Significance and p-values planned lecture lab, data: diving, TTO (time through the order) Normal approximation (CLT) and binomial proportion confidence interval planned lecture lab, data: NBA Players 2023-2024 The bootstrap planned lecture lab Shrinkage & Bayesian statistics Priors & the power of fake data planned lecture lab Empirical Bayes planned lecture live lecture lab, data: NBA player-game box scores, field goals Shrinkage estimation planned lecture – need to re-write this live lecture lab, data: first putt success percentage training data and held-out test data Fully Bayesian models planned lecture live lecture lab, data: NFL game-by-game data for Bayesian power rating and home field advantage model Regularization & ridge regression planned lecture lab, data: NBA lineup data; this is in a .