Mathematical model calculates the probabilities of all the World Cup matches!
Trying to predict the result of sporting events is a highly common, yet an extremely complex practice. The number of variables involved and weight of each is something hard to measure. Furthermore, it is crucial that the model be tested to guarantee that it is in sync with reality, using scenarios as the known outcome. Considering this, the chosen variables were those that impact the result of the soccer matches (from the recent game history to the team’s tradition , quantifying the model based on the quantity of matches that each team played in World Cups, for example) and building a model to predict the path taken by each team in Russia.
This insight presents the results of this study and explains a bit more about how this model was built.
After all, what influences the result of a match? – The variables used for model calculation
Team roster, decision power, tradition, star players, offensive power, defensive strength, creative midfield, wingers that defend and offer support, goalies that transmits security, coaches capable of effectively reading the game and using the players he/she has available in the most effective way… There are countless factors that influence whether a team wins a soccer match. The following variables were used to define the probability of victory for each team per match:
- Number of participations in World Cups
- Number of matches played at World Cups
- Number of world titles
- Number of continental titles
- Results of the last matches for each team
- Number of goals scored in the most recent matches
- Number of goals suffered in the most recent matches
- History of direct competition
- Each team’s continent
- Home field factor (team’s performance when it plays on its home field, the opponent’s field or a neutral field)
These variables were used to create thousands of scenarios, simulating the results of each match and outlining each team’s path. To calibrate the model, the same variables were tested to adjust them based on the 2014 World Cup. The adjusted model was used to generate 10,000 simulations from the 2014 World Cup, generating results that were extremely consistent with the actual World Cup result.
Trial by fire – testing the model during the 2014 World Cup
The 2014 results indicated Germany as the main team favoured to win the title, with a 25.1% chance of becoming champion and Argentina as the team with the second-highest probability of winning the title. Even more incredibly, it indicated Argentina as a “favourite” for second place, with a 16.5% chance of winning this title. Below are the results from the simulations for title, second place, classification for the final match, and classification for the semi-finals.
One interesting fact about the results from the 2014 simulation is the pattern of results from Group B, which consisted of Spain, Holland, Chile, and Australia. Spain came from two European titles and the 2010 international title, earned precisely against Holland in a balanced match determined only in overtime. For many, it was one of the main teams favoured to win the world title. However, luck did not shine down upon the Spaniards, who fell into the “Group of Death” together with the strong Dutch and Chilean teams. According to the model, Holland was the 6th top favoured for the title, with Chile at 7th, and Spain at 8th. However, since only two of these teams could reach the round of sixteen, Spain had a more than 40% chance of being eliminated in the group stage. For comparison purposes, Ecuador, grouped into a pool with France, Switzerland, and Honduras, had only a 30% chance of being eliminated during the group stage. To summarise, the model calculated Holland and Chile as favourites against the Spanish team. In practice, we saw a Dutch team that earned third place, a Chilean team that made Brazil work hard in the round of sixteen, Brazil coming through on penalties, and a Spanish team saying goodbye still during the group stage.
The five-time championship within reach – Applying the model to simulate the results of the 2018 World Cup
More than 50,000 simulations were generated for the 2018 World Cup, indicating Brazil as the top team favoured to win the title as champion in 20.2% of the scenarios. Germany appears as the second highest favoured for the title, with a 16.2% chance of winning. Spain with 14.1%, Argentina with 11.9%, and France with 10.2% to close out the Top 5. Below are the results from the simulations for title, second place, and appearances during the final and semi-finals.
Another interesting analysis presents the teams with the greatest chance of passing the group stage. The five teams with the greatest probability for classification include Brazil, Uruguay, England, Argentina, and Spain. When we examine the chances at the title, England and Uruguay are not among the main favoured teams. The explanation lies in the Uruguayan and English pools, which although not considered the most popular teams and those indicated by the specialists, fell into groups in which favouritism is unquestionable. Uruguay faces Russia, and Saudi Arabia faces Egypt. It is hard to imagine the Uruguayan team being eliminated during the first phase. In the English pool, the match against Belgium may be complicated, but it is also highly unlikely that the English team will be eliminated in a group with Tunisia and Panama.
Still on the topic of Uruguay and England, the path becomes much more complex starting with the round of sixteen. If it classifies, Uruguay faces a team that will come out of Group B – a pool that includes two Iberian powers: Portugal and Spain. Against these two teams, Uruguay is considered the underdog. The model indicates that Uruguay has only a 41.2% chance of classifying for the quarterfinals.
In the English case, the two remaining teams from Group G – which will be Belgium and England if the favouritism is confirmed – will face the remaining teams from Group H – which based on the simulation are Colombia and Poland. The team that faces Poland is favoured to reach the quarterfinals, where, based on the most probable scenario, it would face Brazil. The team that faces Colombia is also the favourite, but in an already more balanced match. The winner of this game puts Germany – widely the favourite in Group F and in the round of sixteen – in the quarterfinals. Summarising, England and Belgium appear in good running to classify for the round of sixteen. Nevertheless, based on the difficulty of the pool, the model indicates that none of them should be a semi-finalist.
Besides those mentioned in this insight, a series of other factors can interfere with a soccer match. Weather conditions, referee errors, injuries, internal crises within the teams, logistics-related errors, concentration issues resulting from external reasons, and countless other factors can affect the way that each team performs and the final result of the matches. Despite this, the results obtained through the model in the 2014 simulation corroborate the chosen model and variables. Now, all we can do is watch and cheer!
About the Authors
Felipe Pena is a consultant at Visagio, specialising in projects focused on budgetary management, process engineering, and analytics in the acquisition and banking sectors.
Marcus Sousa is a consultant at Visagio specialising in projects focused on management model, supply, and analytics in the retail, financial market, metallurgical industry, and energy sectors. Marcus also serves as Leader of the Visagio Research & Intelligence area, focused on surveys and market analyses.