The World Cup already starts this Thursday in Russia and will probably be one of the events that millions of people see in the story, even more than the Olympics. The tradition and the importance of the soccer at the international level is such that no one can really ignore the importance of this sporting event.
In all these events there are forecasts, which are also often used in the houses of bets. So, if a strong team plays against one that in theory is weak, the bookies paid more or less to go and put your money in favor of a particular computer. There are a number of important companies that are dedicated to sports statistics, and considering this, it is assumed that probably Brazil is the clear favorite to win the World Cup 2018, with a probability of 16.6%, followed by Germany (12.8%) and Spain (12.5%).
However, in recent years there have been developed techniques of machine learning that for some reason exceed the capacity of prediction of the statistical approach conventional. What they say these new techniques in the AI of the Cup FIFA 2018?
One answer comes from the work of Andreas Groll, Technical University of Dortmund, Germany, who along with his colleagues, used a combination of machine learning and statistical conventional, which is called the “approach random forest”, which allowed them to know who could be the likely winner of this fair sport.
The technique random forest has emerged in recent years as a way of analyzing very large data sets and thus avoiding the known errors of other methods for data mining. It is based on the idea that a future event can be determined by a decision tree in which the final result is calculated in each branch by reference to a set of training data.
However, decision trees have their problems. In the later stages of branching, decisions may be distorted severely by the training data set, which is sparse and subject to many variations in this type of resolution. This is called the problem of “severe overfitting” or in Spanish, overfitting.
In the approach of the forest is random, instead of calculating the result of each branch, the process calculates the result of branches at random. And this makes it a number of times, each time with a set of branches selected at random. The end result is the average of these decision trees built at random. This approach has its advantages: First, it does not suffer from overfitting that is literally a “plague” in the decision tree ordinary. and second, it also reveals what factors are most important in determining the outcome.
So then, if a decision tree particular includes many parameters, it is very easy to see what has the biggest impact on the result and what is not. These less important factors, then, are ignored in the future. Groll and colleagues used this approach exactly to know who could be the winner of the World Cup 2018. They modeled the result of each match they play and that they could play. Then using those results to build the most likely future of the tournament.
The researchers began with a broad range of factors that could determine the outcome. This included economic factors such as GDP and population, the ranking of the national teams according to FIFA and the properties of each computer as: the average age, the number of players who have been champions of the league, if they have an advantage by being local, and so on. Groll and colleagues used the rating values of the equipment that some bookmakers do, which speaks of the goodness of the approach used, according to the researchers themselves.
All these data are fed into the model, which yielded some interesting data. For example, they found that the most influential factors were the rankings of the teams (created by other methods) and betting houses, the FIFA itself and third parties. Other important factors were the GDP and the number of players league champions in the team. Factors unimportant were the nationality of the technical director, for example, among others.
Predictions found different from each other in many ways. To begin with, the method of the forest scene indicated that Spain is the most likely candidate to win, with 17.8. Interestingly, today the president of the Royal Spanish Football Federation (RFEF) announced the dismissal of Julen Lopetegui as coach of the selection of football of Spain a day before the start of the World cup Russia 2018. The substitute shall be the current sporting director of the RFEF, the former footballer Fernando Hierro. So maybe the researchers would have to take into account this factor of last time.
By the way, in the model of the Technical University of Dortmund the Selection of Mexican soccer, or even advance from the group stage, it is removed from the Group F along with South Korea.
On the other hand, a critical factor in the prediction is the structure of the tournament itself. If Germany happens without problems in its first phase, it is more likely that you will have to face the strongest opposition in the phase of knock-out of the 16 teams that are left. For this reason, and according to the method of the forest is random, the chances of Germany getting to the final would be just 58%. In contrast, Spain could not face a strong opposition when there are 16 teams and have a 73% reach the quarterfinals.
If both teams reach the quarterfinals, they have more or less the same chance to win. “Spain is slightly favorite on Germany, because the germans have a comparative chance higher of being left out in the round of 16 teams,” says the research work.
But if all of this wasn’t enough for some puntillosos analysts, the method of the forest scene, it was possible to simulate the entire tournament and as well produce a different result. Groll and colleagues simulated the tournament is 100 thousand times. “According to the tournament more likely, instead of Spain, the winner most likely would be Germany,” he indicated. So, then, at the start of the tournament, Spain could be the team with the most chances of winning, but his Germany reaches the quarterfinals, may be the favorite team.
The tournament starts this Thursday, with the traditional party where, on this occasion, the host, Russia facing Saudi Arabia. None of these teams, according to the model of Groll, will reach even the quarterfinals.