This article appeared in the DailyO, on Thursday, 14th June, 2018
With the 2018 World Cup set to kick off in Russia this summer, everyone and their mother is scrambling to make bets on the national team that will take home the title. While it’s easy to make predictions that are more accurate than an octopus (Remember Paul, the Octopus?), even the best predictions are still hurt by the very noisy process that is a football match. Consider that for the 2014 World Cup the gurus at FiveThirtyEight (One of the best data analytics companies in the world) — who went to a lot more trouble than someone pontificating at a party — could really only narrow their choices down to four teams, and even then, they put their money (so to speak) on Brazil, who didn’t even make it to the final.
Rather than come up with our predictions, like some of the bigwigs like SAP and IBM have traditionally done, we at Infinite Analytics, thought it might be more interesting to contemplate why football (or soccer to the USA) is so difficult to predict, even in the era of fancy algorithms. Or, why expecting an upset is the best prediction you can make.
At its core, a football competition is an experiment designed to rank given groups of players from best to worst, as measured by the number of goals a given team scores against its opponents. This is relatively easy to do over a large number of games: better teams should win more often, and worse teams should lose more often. But because the game is so low-scoring and the difference in quality between teams is relatively small — especially among the best ones for deciding a World Cup champion — predicting the outcome of a single match between two well-matched opponents is very hard.
As anyone who’s watched a football match knows, most games have fairly low scores. In the latest English Premier League season, for example, teams scored on average only 1.34 goals in a given match.
It’s also very hard to score a goal. Through the latest season, the best team, Manchester City, made 664 attempts at a goal, but only scored 106 points. That’s only a 16% success rate. Moreover, the difference between the best and average team isn’t that large. Through the last season, the average success rate from a goal attempt was about 11%. (Source: http://www.footstats.co.uk/index.cfm?task=league_shots)
This might seem like a big difference, but even though an intuitive argument we can see why it’s actually very hard to observe given the structure of a football match.
Instead of a match between two teams, suppose we’re instead playing a game where you win $10 if you can guess which of two tainted coins is more likely to land heads up, in the same way you’re effectively trying to predict which football team will score more goals in the course of a match. How many times would you want to see each coin flipped to pick one? Likely more than happen in a typical football match.
Let us represent a match between two teams with a game where you flip two coins. A goal is represented by the coin that lands heads up – So every time a coin that represents a certain team lands “Heads”, it means a goal.
Let’s turn to some statistics from the latest season of the very competitive English Premier League. In the latest batch of matches, a given team only made 12 goal attempts per game, on average.
So assuming that you are ready with your coin for your favorite team and get to flip it 12 times.
What would happen if the best team in the latest season, Manchester City, went up against a hypothetical team with an average scoring success rate?
We ran a few simulations with the scoring success rate of Manchester City (about 16%) and a hypothetical average team (11% success). Dark circles represent a goal attempted and scored; light circles represent a goal attempted but missed or blocked. Each group of 12 represents a match.
This analogy makes a huge number of simplifying assumptions, to be sure. Teams don’t always make exactly 12 goal attempts. One team’s goal scoring success depends on the quality of the opposing team, external factors like player fatigue or injury, or random events like a star player losing his temper and getting a yellow card.
Fortunately, this doesn’t obscure the core argument. On average, the effects of the difference in team quality should be proportional to our confidence. That is, we would expect a very uneven match to make a bad team worse (they allow more goal attempts and let more goals through), and a good team better (they can make more goal attempts, and more goals are let through). Whereas with two evenly-matched teams — such as in the knockout stages of the World Cup — these effects should balance out.
So at the World Cup, it’s relatively tricky to predict the actual champion, but relatively easy to make good predictions about the best teams. It’s rare that a bad or mediocre team makes it past the group stage, and then survives past the initial knockout rounds. It’s not at all uncommon for “upsets” to happen in the knockout rounds between the handful of excellent teams that make it that far. Brazil were considered the solid favorites in 2014, but were roundly defeated by Germany in the semi-final.
Going back to our simplified coin flip analogy, let’s consider the two best teams in the latest English Premier League seasons. The second-best team, Manchester United, had a success rate closer to 13%, versus 16% for Manchester City from earlier.
When the two rival Manchester clubs played one another in this season, it was United, not City, that came out on top by a margin of one goal. Once again, we ran a quick simulation, assuming 12 goal attempts per game, and once again, dark circles represent successful goal attempts, light circles are missed or blocked attempts.
It’s certainly possible to do a better job predicting match outcomes than simply looking at a team’s win-loss record, or the fraction of goal attempts they make that are successful. There’s even research applying graph analysis techniques to the problem that produces good results. And indeed, these techniques probably would do a great job predicting situations where teams get to play lots of games.
When it comes to it, though, the World Cup is decided by a series of winner-takes-all matches in the knockout rounds. This noisy goal-scoring process we explored earlier still dictates outcomes. By considering player fatigue or the strength of a given team’s opposition, we may be able to state more confidently the Brazilian team is better than the English team on the merits, or in some broader sense. But that doesn’t change the fact both teams are extraordinarily good, and the mechanism for final arbitration — their potential matchup in the knockout stages — produces a very noisy, hard to predict signal.
It’s a relatively safe bet that the French will edge out the Peruvians in the group stage. But counting out the perennially disappointed English team is probably riskier than the pessimistic commentariat might suggest.
For the best odds, we predict an upset.(For anyone playing along at home, in the first set of simulations, the red team represents Manchester City, and the blue team is a hypothetical team with the Premier League’s average scoring rate. In the second set, the purple team represents Manchester City, and the green team is Manchester United.)