Nowadays, sports clubs employ entire teams of data analysts. Their analyses help to uncover bias in the performance evaluation of decision-makers – and are therefore crucial for game strategy, the transfer of players or the dismissal of coaches.
A football match in the top European leagues generates over three million data points. On the one hand, optical tracking systems in the stadiums automatically track each player and the ball 25 times per second. On the other hand, trained operators manually collect approximately 1,700 on-ball events such as passes, shots, dribbles and tackles. Performance evaluation of teams is one important area in which this plethora of data can add value for decision-making. Because: Key outcomes in football, like goals and match results, are notoriously unreliable performance measures.
The league table never lies? That’s a lie!
Goals are exceptionally rare events in football. On average, approximately 2.7 goals are scored per match and matches are regularly decided by a single goal. As a consequence, the random component in a match result is substantial. However, only the match results count for the league table ranks. How likely is it therefore that the league table accurately reflects the strength of all teams when the role of luck is considerable? Quite unlikely. To be reasonably confident that the table does not «lie» in this regard, each team would have to play much more than the traditional 38 games per season.
Using detailed data as a «lie detector»
The right choice of detailed football data can help to evaluate a team‘s performance more accurately. While many performance metrics have been proven to be widely useless (e.g., distance covered by a team or the number of crosses played), focusing on shots has been promising. Shots are positively correlated with goals and occur much more often. However, because some shots are better than others, each shot must be assigned with a scoring probability. This is the intuition behind the «expected goals» (xG) metric, which gained importance in recent years.
The scoring probability of each shot is typically estimated based on the position on the pitch (distance and angle), rule setting (open play, free kick, penalty kick), body part, defensive pressure, and goalkeeper position. These estimates can then be aggregated into xG for each team and match to reflect a performance evaluation metric that is less prone to randomness. While this approach is not without limitations, it is very useful to detect «lies», i.e., situations in which good and bad luck played a considerable role in determining match results. Moreover, xG are merely the starting point of a more processoriented performance evaluation in football. Instead of only evaluating shots, one could evaluate all on-ball events using «expected possession value» models for which tracking data become highly relevant.
Why is a «lie detector» useful?
Football clubs and fans are primarily interested in match results and the rank in the league table. This likely leads to outcome bias in the performance evaluation of decision-makers, where the role of good and bad luck in match results is underestimated. Thus, being informed about «lies» in match results helps to avoid many mistakes such as dismissing the coach due to a string of bad luck or changing the strategy of play even though it is not warranted. Outcome bias is a far-reaching phenomenon, and luck in key outcomes is regularly mistaken for skill. The resulting implications for performance evaluation extend beyond football: Not only in sports, but also in business and industry, individuals as well as companies can therefore improve their decisions by incorporating data-driven «lie detectors» in their evaluation of past decisions or performance.
Author: Dr. Raphael Flepp