Models


All of the models used in this site are my own Python implementations, trained specifically for use on this site.

Shot xG

Summary:

The Shot xG model is a model that predicts the likelihood of a shot resulting in a goal based on the shot characteristics such as where the shot was taken from, which body part was used (foot, head, other) and what the play context was (open play, direct free kick, etc.). It is used to estimate the average expected conversion likelihood for shots taken in matches to determine how many goals each team should have scored/conceded, on average, based on the shots taken in the games.

Technical details:

The shot xG model is an asymmetric sigmoid function (generalised logistic function) whose parameters are fit to reproduce observed goal rates from shots taken (all shots, not just shots on target). The variables that drive the mode are:

Other characteristics such as shot speed, location of defenders/goalkeeper, shot placement, or identity/quality of the player taking the shot are not being considered as of the current xG model.

Below are a few examples of the expected goalscoring likelihoods at different parts of the pitch for 3 shot types. The model includes 12 distinct shot types (combination of body part and play context) in total.

Match result

Summary:

The match predictions on this site are based on a Poisson GLM regression model trained on historic match xG.

Team strengths, and therefore predictions, are estimated based on the number and quality of goalscoring opportunities teams have created and conceded in matches played, with more weight placed on recent matches, allowing for opposition strength, and home advantage.

Match squads, player injuries, manager changes and player transfers aren't explicitly taken into account by the model, but the model does eventually implicitly pick up many of these effects as they manifest in increased or decreased goal production rates in matches.

Technical details:

This site's approach is heavily based on the idea presented by Ben Torvaney of fitting team strengths to expected goals scored rather than goals scored directly. Expected goals are a more representative measure of how well teams played than observed goals which contain stochastic measurement uncertainty. Models trained on xG rather than goals have been found to be more predictive of future match scores and results than equivalent models trained on observed goals.

This site's approach has adapted the Torvaney approach in 2 ways:

For historic matches where shot data is not available, the observed final results were used instead to train the model in place of the weighted simulated outcomes from shot xG. This only affects the older team strengths in the team strength timeseries plots.

In-Play

Summary:

The in-play model estimates how teams' goalscoring and conceding rates change as the match circumstances change. It takes into account the fact that goal rates aren't even throughout the match (more goals occur towards the end of the match), changes in team attack and defence strength when they're ahead or behind in GD, and changes in attack and defence strength on losing a player to a red card.

Technical details:

The expected downstream goal rates (xG) part way through a match are the pre-match xG forecasts attenuated according to the empirically observed fraction of all goals observed that occur after each point in match time played. The impact of being ahead/behind, as well as differences in the number of players still on the pitch are accounted for by fitting scaling coefficients for team attack and defence strengths based on in-match team GD and difference in number of players on pitch such that the scaled in-play xG best predicts observed downstream goal rates.

© 2020 - Dinesh Vatvani | FootballGeek