March Madness Math
AUTHOR’S NOTE: My second child was born less than 24 hours ago but I felt like I had to get this out by tip-off. Please excuse any errors.
I love college basketball. I’m a graduate of UNC-Chapel Hill (home of Michael Jordan) and grew up a fan living thirty minutes away. Needless to say, I spend a little too much time filling out brackets and watching hoops during business hours this time of year. And in the spirit of all things Alpha Theory, I have a systematic approach to filling out my NCAA brackets. But my system needs a little fine tuning. I’ll give a little background to set up the problem and hopefully someone will have an answer.
GENERATION ONE. Creating a systematic approach to fill out the brackets requires good input. From 2008-2011, I took Vegas odds for each team to win the national championship to serve as a proxy for team quality and strength of the path they’ll have to travel. For an example of the calculation, see the chart below. Kentucky is the favorite at 8/5 odds. If I bet $5 on Kentucky and they win, I receive $8. That assumes that 8 times out of 13 (8+5) Kentucky will win or 61.5%1 (8/13). The next step was to calculate the percentage for every team in the tourney, sum up all the percentages, and divide the individual teams win percentage by the sum of all the percentages to get a true probability of winning the tourney2. The next step was to use those probabilities to create a forecasted probability of winning for one team versus another. For example, if Kentucky (29% chance of winning it all) plays Missouri (4.6% chance) then the adjusted probability of Kentucky winning is 86% (29% / (29% + 4.6%)). At this point I could have filled out my brackets using a random generation (i.e. use a random number generator to pick a random number between 0 and 100 and if it falls above 86 then Kentucky loses, and if it falls below, they win. Or I could have just used Vegas probabilities to pick the winner which pretty much means picking the Vegas favorite in each round. But here is the problem, Vegas odds don’t give the granularity necessary to pick the early round games because they make very little differentiation between teams (see the clusters of odds below).
GENERATION TWO. While looking for a solution, I came across KenPom.com. This is a sophisticated site steeped in the teachings of Bill James, the pioneer that created the statistics that led to the Moneyball movement. KenPom creates an adjusted winning percentage that controls for a multitude of factors. This winning percentage allows for comparison of teams across the spectrum. Additionally, in the KenPom blog there is reference to the Log 5 method which takes two teams’ winning percentages to determine the probability of one team beating the other. This is the breakthrough I was looking for.
This allowed my NCAA bracket generator to have much more precise winning probabilities and thus a much more accurate forecasting engine. Here are the brackets based off KenPom adjusted winning percentages and Log 5-based probability of head-to-head success:
THE PROBLEM. Now here is the rub. Let’s say that I’m playing in an NCAA pool that has Cinderella points (1 point added for each point of difference in the seed). I can calculate an expected return using the probability of winning (i.e. Wichita St. (#5 seed) versus VCU (#12 seed) – KenPom is 93.5% and 80.4% respectively). Using Log 5 we calculate a 77.1% chance that Wichita St. wins the game. The expected return for Wichita St. is 77.1% * 1 pt = .8pts and the ER for VCU is 22.9% * (1pt for win + 7pts for seed) = 1.8pts. In this case you would pick VCU because they give you an expected value of 1.8 versus .8 for VCU. If the tournament were one round, this method would maximize expected return. But if I pick VCU to beat Wichita St. in the first round, then I can’t use Wichita St. in future rounds. If I wouldn’t have chosen Wichita St. or VCU to move beyond the second round, then I should definitely pick VCU in the first round because there is nothing lost for not having the right team lose in the second round. But if I would have had Wichita St. win a future round, which I would have in this case, then I’ve done myself a disservice by eliminating them in the first round to maximize first round expected return.
I’m looking for a good way to account for this dynamic. I’m assuming someone with a good grasp of Bayesian/Stochastic tools may have a solution. Anyone have a suggestion? I would be glad to send my sheet along to anyone that would like to try and take a stab at solving the problem or would just like to use the sheet. Please help me solve this March Madness Math.
1 Clearly 61.5% is too high a chance of Kentucky winning the tournament. But if Vegas used realistic probabilities, then they wouldn’t make money from the spread. But how unrealistic are the probabilities that they use? Let’s assume that Kentucky is the best team in the country by a wide margin and figure out the odds of winning the six games necessary to win the national championship:
First game – 100%, no chance they lose. If they play 100 times, they don’t lose once.
Second game – 95%, they’ll be playing an 8/9 seed, play 100 times, they lose 5.
Third game – 80%, playing a 4/5 seed
Fourth game – 70%, playing a 2/3 seed
Fifth game – 65%, playing a 1/2 seed
Sixth game – 60%, playing a 1/2 seed
Multiply them all together and the probability of winning is 21% for the best team in the country. Even if I raise the winning percentage to 80% for all the rounds from the third round on, it is still only 39%. So 62% odds are off the chart and tell you how expensive it is to do prop bets in Vegas. What this also tells us is that fans really shouldn’t be upset when their team doesn’t win the national championship. The odds of the best team in the country are only slightly better than rolling a die and landing on 1. Take solace in the fact that single elimination tournaments are subject to all kinds of luck and be happy that your team is dancing.
2 Sum is greater than 100% because Sportsbooks make their money from the spread. If the odds were reflective of the real probability, then Vegas would just breakeven…and we can’t have that can we? For example, Kentucky is forecasted by Vegas to have a 62% chance of winning it all, but adjusting by the sum total of all teams’ odds of winning, which is 215%, the true Vegas probability of Kentucky winning it all is 29% (61.5%/215%). So instead of paying $1.60 (8/5) for each dollar bet, Vegas should actually be paying $3.50 for each dollar bet. Needless to say, the futures bet on NCAA champs is a real suckers bet.