Over the past few seasons, there has been a lot of dispute over weight put into "advanced statistics" and their use on this very blog. It feels as though I see more and more people citing studies and statistics such as corsi as near-indisputable evidence of any given analytical point. Though I believe that these statistics and the way that hockey's online analytical community believes they should be used should pull weight upon the opinions that fans develop, I have an issue with the notion that statistics of any kind may be cited as factual and/or near indisputable evidence of the characteristics of a given player or team without an analytical sense of exactly how they may be influenced by what actually happens on the ice. The purpose of this post is to philosophically evaluate and explore the purpose of our sport's data and the accuracy with which we can attribute it to true results.
Let's start with the idea of Corsi, the statistic that differentiates shot attempts for and against occuring when a given player is on the ice. Corsi is oft interpreted to mean both that a player is defensively responsible and is good at "driving the play" offensively, because a player who prevents shot attempts should in theory be a better defensive player, and a player who creates many shot attempts will be good at carrying the puck into the zone, passing to open spots and shooting. A high Corsi rating may in fact be indicative that a player holds both of the formerly mentioned characteristics. The problem is when high Corsi is held as definitively indicative or even very confidently indicative of these characteristics. The flaws in the statistic can be found through simply logically interpreting in-game situations that represent its inaccuracy.
In other words, playing the devil's advocate (lololol get it like the Devils??).
For example, say a player is started primarily in the defensive zone. Logic would say that because the player is close to the average distance from which a shot would be taken to their team's net, the player therefore becomes more prone to a shot against. This would not necessarily mean that they are a poor defensive player and cannot effectively prevent shots compared to a similarly defensively skilled player who would take more offensive zone starts, but rather that the situation at hand simply makes them more prone to shots. Thus, their Corsi (and Corsi Rel or Corsi Rel Qual) would go down without reflecting representation of their defensive skills. The same could be said about the opposite. A player that is started in the offensive zone more often will be more likely to be on ice for a positive shooting event thus inflating their Corsi compared to a player who may have similar "driving" skills but started in a zone further from the team's average shot distance.
One may say, where is your "hard evidence" that this negatively effects Corsi's ability to represent both of these characteristics. My answer to this is that Corsi is a statistic based upon a logical idea. What I have just described is also based upon a logical idea. They are two equal things. The only difference is that one has been explored by an amateur community.
Another flaw is that a given player's "game" or skill set may not be accurately represented by Corsi. A defender who stands in front of a net or covers position well to prevent quality chances, wouldn't necessarily prevent shots, but might increase the goalie's save percentage, thus limiting goals and exhibiting defensive responsibility. A good example of this is Bryce Salvador.
Bryce has always produced a very low Corsi rating, but his on ice save percentage hasn't been lower than 92% for the past four seasons; consistently higher than the teams overall save percentage. This shows that there is a consistent defensive quality about bryce that is not being captured by Corsi. One could say that his on-ice Goals Against per 60 minutes is fairly high compared to his team, but the only way we could use that measurement to evaluate his defensive competence is by comparing it to an accurate (one that we probably do not have) metric of his competition's quality.
One could also hypothesize that a player who makes good passes and brilliant plays would not be on the ice for as many unproductive shots, showing an offensive quality that wouldn't be captured by the Corsi differential.
I'd also like to touch upon a study that was mentioned in the commentary section of the Jagr post by Mike Stromberg. The study essentially explains that scoring production peaks at 25. A problem is that people use such research in ways that may not be efficient for the question they are trying to answer. For example, though the article says that scoring peaks at 25, it does not account for injuries (which typically positively correlate with age in both normal life and hockey life). I've seen the same study used to cite that a player who remains perfectly healthy will have a decrease in scoring, even though the study isn't very relevant to such a conclusion. The study itself even explains that if a high games limit is selected, the age tends to go up (exactly how far is not specified).
A problem with the study itself is that a large percentage of these players turned 25 in a time when league scoring was at a high in the 80's to mid 90's (all of the players were born between 1962 and 1979). As they aged the dead puck era began dropping to a low in 2004 (when most of the players were well past 25). A massive decrease in overall scoring would've influenced the results of say, the average 29 year old season, negatively.
The study also does not evaluate whether players are typically put up against more difficult competition as they age.
My point is not that these statistics and studies should be ignored, or that they should not influence your opinions. My point is that to express that a piece of data with a large number of flaws or potential inaccuracies relative to their intended purposes should not be cited as fact, with arrogance, or even particular confidence whilst exploring a subject that may be covered by said purposes. That said, it is my hope that advanced statistics and studies work to account for these flaws to create a better understanding of the game and better statistics.
It is probably true that Corsi is a good indicator of driving play and defensive capabilities, and that the average offensive peak of an NHL player is around 25.
But it is also up to you to remember that it is still only a probably, and in my opinion, it is your duty to express your opinions as such. Sorry if this was boring. I was bored.