1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Statistical methods for football

Discussion in 'Miami Dolphins Forum' started by cbrad, Feb 4, 2020.

  1. cbrad

    cbrad . Club Member

    7,521
    8,912
    113
    Dec 21, 2014
    A good suggestion was made by Irishman to create a dedicated thread for the discussion of statistical methodology in football. The idea is to transfer as much of the discussion about statistical methodology to one thread so that it doesn’t clutter up other threads. Results of statistical analysis can still be presented in other threads, but discussion about statistical methodology is probably better off in its own thread.

    I’ll start off this thread by giving a simple intro into 4 commonly talked about methods:
    1) z-scores
    2) correlations
    3) confidence intervals
    4) hypothesis tests


    z-scores

    The purpose of z-scores is to take different sets of measurements and put them on the same scale. A common application of z-scores in football is to compare stats across eras.

    I’m not sure how best to explain z-scores (for me it would be best to just show the math.. one line and we’re done lol), but one way to think of z-scores is to ask how you would convert between two different scales like Celsius and Fahrenheit. First you need to know what the origin of the scale is (where zero lies): zero degrees Celsius corresponds to 32 degrees Fahrenheit. And then you need to know how the units of measurements compare: every 5 degrees increase in Celsius corresponds to a 9 degree increase in Fahrenheit.

    Same thing with z-scores. First you need to know what the origin corresponds to: a z-score of zero always corresponds to league average. Then you need to know the “unit of measurement”, which in this case is the standard deviation (a measure of how far the values go from league average). For those interested in how standard deviation is calculated:
    https://en.wikipedia.org/wiki/Standard_deviation

    An example of how to convert a stat like passer rating to z-scores. In 1970, the league average for team passer ratings was 66 so you automatically know 66 rating in 1970 equals z-score = 0. The standard deviation was 14.34 so one z-score unit in 1970 equals 14.34 passer rating points. If a QB had a rating of 85 in 1970 that’s (85 – 66)/14.34 = 1.325 z-scores above the mean. If it’s below the mean you just have a negative z-score.

    How to interpret z-scores? They tell you how “impressive” something was relative to league average, regardless of era or even type of measurement (you actually could compare z-scores of passing stats to z-scores of rushing stats). However, z-scores do not tell you whether an offense with a z-score of 1.325 in 1970 would perform with the same z-score in 2019. There’s no implication about transplanting someone from the past to the present or vice versa, just a measure of how “impressive” something is regardless of era.

    One final thing: you could use z-scores to report the "adjusted" rating in some target year, like a 1970's rating in 2019 numbers. Just calculate the z-score than translate to 2019 numbers, no different than going from Celsius to Fahrenheit.


    Correlation

    A correlation is a measure of how two sets of stats are related to each other. Without getting too technical I’d just interpret it as a measure of how much you can predict the value of one stat from the other stat. If you have a correlation of zero that means you have no ability to predict beyond random guessing. If however you have the maximum possible correlation of 1 or the minimum possible correlation of -1 that means that by knowing one stat you know exactly what happens to the other stat. The only difference between 1 and -1 is that a correlation of 1 means an increase in one stat implies an increase in the other stat, while a correlation of -1 means an increase in one stat implies a decrease in the other.

    Thus, correlations range from -1 to 1 and the closer the number is to either -1 or 1 the better you can predict one stat from the other. Two things to remember with correlations: 1) there is no implication of causality, and 2) there is no implied order in the two stats being compared, meaning that the predictive relationship is the same regardless of which stat you use to try to make the prediction.

    One more thing about correlations: the degree to which you can predict one stat from the other is actually not the correlation itself but the square of the correlation, which is called r-squared or r^2. So if someone reports a correlation of 0.5 it’s really the square of that number that is meaningful: 0.5^2 = 0.25 because that tells you the percent of variation in one stat you can explain by looking at the other stat (in this case 25% of the variation in one stat is known by looking at the other stat).


    Confidence intervals

    With almost everything in statistics there is something called a confidence interval, or CI, associated with it, and usually it’s a 95% CI. A 95% CI just specifies the range within which the true value of that statistic lies with 95% probability. The important thing about CI is that it is dependent on sample size, and it’s how you see the effect of sample size on a stat.

    To be clear, almost never do you see 95% CI reported in commonly available football stats. That’s because the standards are low. They really should report 95% CI with every stat so you can see how uncertain the estimates are.

    To give you an idea of what 95% CI looks like for Tom Brady (only QB I’ve calculated it for), after 1 game played in a season the 95% CI for Brady’s passer rating spans almost 140 passer rating points (70 above and 70 below whatever rating he got in that first game!), after 2 games it goes down to about 40 above and 40 below his passer rating after 2 games, and that 95% CI keeps decreasing as more games are played. In other words, as sample size increases the range within which the “true” passer rating lies keeps shrinking and you can have more confidence that the stat reflects "true" ability.

    CI is how sample size affects every single statistic, and I put it here not only because it’s important, but also because it makes explaining the next topic easy.


    Hypothesis testing

    You often hear that something is “statistically significant”. All that means is that something is too unlikely to have occurred by chance alone. Once you know the 95% CI (see previous section), all you have to do is ask whether the statistic you observed lies within the 95% CI or not. If yes, it is still too likely to have occurred by random variation alone. If no, then it's "statistically significant" and considered too unlikely to have occurred by chance alone.

    The choice of using a 95% CI rather than a 99% CI is arbitrary, but it’s the standard in almost every area of science. 95% CI corresponds to a 1 in 20 chance of the event occurring by random variation alone and that's generally unlikely enough in most contexts to say it's "statistically significant". I’ll note however that there are other contexts where the threshold is way higher. Best example is particle physics where the threshold might be at 1 in 3.5 million (5 standard deviations) before it’s statistically significant lol.


    OK.. maybe that will get things started.
     
  2. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    Great stuff cbrad, thank you.

    I'll just add that with regard to the topic of z-scores, a z-score is also a measure of how rare (or common) something is.

    Take the following distribution for example -- this is what's known as the normal distribution:

    [​IMG]

    In this particular example of a normal distribution, the x-axis (horizontally along the bottom) consists of IQ scores. The y-axis (along the left side in the vertical plane) consists of the frequency of those IQ scores in the population. So the further you go either right or left along the x-axis, the more rare the IQ score. The more toward the middle of the x-axis, the more common the IQ score. So IQ scores of 145 or more (or 55 or less), which are z-scores of 3.0+ (or -3.0+) are exceedingly rare, whereas IQ scores between let's say 95 and 105 are far more common.

    How does this translate to football?

    Passer ratings for example are likely distributed normally as in the graph above, where very low and very high passer ratings are rare, and average passer ratings are much more common. If we use career passer ratings to represent quarterbacks' quality, then there have been far more average quarterbacks in the league than there have been extremely good or extremely bad ones.

    So with a z-score what we have is a measure of how much something deviates from the norm. An IQ of 145 deviates considerably from the norm, as does a season passer rating of 117.5, a la Ryan Tannehill in 2019. The average passer rating in the league in 2019 was 90.77 with a standard deviation of 10.3, and Tannehill's passer rating of 117.5 was 2.59 z-scores (or standard deviations) above league average. That means it deviated considerably from the league norm.

    The next time you go to your doctor and he orders labs for you, if you get a printout of your labs you'll see that there is a normal range for every value. Your white blood cell count for example has a normal range of 4,500 to 11,000 white blood cells per microliter of blood. If you find yourself above or below that normal range, the next question for you should be "how far," and we could answer that with a z-score, which tells us how much your white blood cell count deviates from the norm.
     
    Surfs Up 99 and cbrad like this.
  3. TheHighExhaulted

    TheHighExhaulted Well-Known Member

    2,182
    1,682
    113
    Jan 15, 2008
    New York
    [​IMG]
     
  4. Irishman

    Irishman Well-Known Member

    346
    372
    63
    Oct 16, 2017
    High Point, NC
    Thanks for that introduction to statistics.

    I hope as the off-season doldrums roll on, some of our posters will be able to review this information without bringing in their perceptions (biases) and getting "wound around the axle" with word definitions.

    When this information is examined prior to trying to use it, I suspect it will reduce the number of posters feeling they are being attacked for an opinion, when all that was presented was a piece of information about a statistic and its use.

    It will be interesting to see what questions come up as a result of this primer on statistics.
     
    danmarino and cbrad like this.
  5. Pauly

    Pauly Season Ticket Holder

    3,173
    3,111
    113
    Nov 29, 2007
    Some points regarding passer rating.

    1) It was developed before the NFL started tracking “sacks” as a statistic. The NFL wants to compare QBs across all of its history so they will not incorporate sacks into their official method.

    2) Passer rating would be improved if sacks were included as negative pass plays. However, because sacks are distributed normally it means that for the majority of large sample situations sacks will not dramatically alter the outcome.

    3) As time has progressed the NFL has gotten better at passing the football. Average passer rating has been increasing. To adjust historical records to allow comparison with current ratings use the following formula:
    [target year rating] x [current year average/target year average]

    4) The NFL made major changes to the passing rules starting in the 1978 season. Data from 1978 onwards has lower standard deviation and is more consistent year to year. If you are comparing current data to historical data it is best to use 1976 onwards as your historical data set.
     
    danmarino and Irishman like this.
  6. cbrad

    cbrad . Club Member

    7,521
    8,912
    113
    Dec 21, 2014
    Let me put these statements into proper context.

    Adjusting ratings using z-scores is in almost every case the correct approach (I'll list exceptions at the end of this post). However, IF the standard deviations across years are similar, then adjusting by z-scores is accurately approximated by the formula in #3. However, you cannot use that simple method of adjustment when the standard deviations change, as they did drastically from pre-1978 to post-1978 onwards for passer rating. So if you want to compare passer ratings from 1970 to 2010 or so you HAVE to use z-scores. If however you only care about passer ratings post-1978 then the formula in #3 is fine.

    Also, that 1978 boundary is ONLY for passer rating (and also a few other passing stats). It's not a boundary for all stats.

    So when should you not use z-scores? When the distribution of stats is highly skewed. The problem with z-scores is that they assume symmetry in the distribution of the stat which is usually correct for football stats with large sample size. But when you take ratios like TD:INT ratio that symmetry disappears and it's better to adjust stats differently. How to do that is on a case by case basis, but in some cases the formula in #3 works if you replace "average" with "median" which is insensitive to skew.
     
    Last edited: Feb 4, 2020
  7. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    If sacks were weighted by their effect on expected points added and/or win probability, that would likely dramatically alter the outcome. Simply calling a sack a failed pass attempt and subtracting the yards lost on sacks probably wouldn't, however, but we know sacks mean more than just that.
     
  8. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    I'd like to say something else about correlation here, and apply it to something seen frequently in the forum.

    The following is a graph of the correlation between passer rating differential and win percentage in the NFL for the 480 team seasons from 2004 to 2018:

    [​IMG]
    The magnitude of that correlation is a whopping 0.81 (95% confidence interval is 0.773 to 0.836).

    One of the values of a correlation is that it tells us what to expect. With regard to the above correlation for example, if the Dolphins were to achieve a relatively large passer rating differential in any one season, we could expect them to also have a high win percentage. You'd certainly be foolish to bet money on the Dolphins' having a large passer rating differential and at the same time a low win percentage. You'd certainly lose that bet the vast majority of the time.

    Now, the correlation above isn't perfect, however. It isn't 1.0. So, there is some "fudge factor" with regard to predicting win percentage on the basis of passer rating differential.

    What we sometimes see in the forum is someone's taking the exception to the rule in the case of a strong correlation and stating that it indicates there is no relationship between the two variables involved. In the case of the above information that might sound something like, "well passer rating differential doesn't mean **** with regard to winning, because in 2010 the Raiders had a small passer rating differential and went 11-5 anyway [hypothetically]."

    Very few correlations are 1.0, meaning that such exceptions to the rule can always be found. But what we're talking about here is what's highly probable, and you should certainly believe that the Dolphins would be highly probable to have a high win percentage if they had a large passer rating differential.

    So, when you take just a single example and say it indicates something about the relationship between two variables, you must also know the larger correlation at hand between those two variables, based on a much larger sample size.

    This is the value of statistics.
     
    Pauly and cbrad like this.
  9. FinFaninBuffalo

    FinFaninBuffalo Well-Known Member

    481
    978
    93
    Dec 13, 2007
    I commend you on your ability to provide useful instruction without a hint of condescension. A rare skill on the Internet.
     
    Pauly, danmarino, cbrad and 1 other person like this.
  10. cbrad

    cbrad . Club Member

    7,521
    8,912
    113
    Dec 21, 2014
    Your graph isn't showing up on my browser for some reason. I've seen that graph before though and it's correct. In any case, let me see if I can post the graph for 1966-2019:

    [​IMG]

    Only thing I'll add to your post is that most stats in the NFL have a linear relationship, like you see in the graph above. That is, the trend isn't some curve but is instead a line which is plotted in dark red. That trend line is what you use for prediction purposes. That is, the correlation only tells you the strength of the relationship, NOT the relationship itself. The nature of the relationship is specified by the best-fitting line or best-fitting curve.

    The equation for that best-fitting line between win% and passer rating differential (across NFL history in the SB era) is:

    Win% = 0.95*[passer rating differential] + 50

    So, for each increase in passer rating differential you increase win% by about 0.95%. In a 16 game season that means that you need to increase passer rating differential by on average about 6.58 to add an extra win.

    Oh, and for future reference, the process of finding the best-fitting line to data is called "linear regression".

    EDIT: I should be more clear about something. Correlations tell you the strength of the relationship between the variables themselves. They don't tell you how well you could predict Y from X in the most general sense because for prediction purposes you could use some fancy non-linear function of X to predict Y. In other words, correlations tell you the strength of a possible linear relationship between X and Y.
     
    Last edited: Feb 5, 2020
    Surfs Up 99, The Guy and danmarino like this.
  11. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    So as cbrad suggested I'm going to move the discussion about QBR to this thread.

    This morning I took three hours and 30 minutes and combed through individual ESPN.com box scores for QBR data. I don't wish that on anyone.

    Anyway, the question for me was, what is the correlation between QBR differential and points differential in single games in the NFL, and how does that compare to the correlation between traditional passer rating differential and points differential?

    So, to gather the data I chose 100 games at random from the 2019 regular season. Randomness was achieved by ordering games on the basis of a variable that's unrelated to either of the ones in question: yards per punt.

    So I took the first 100 2019 games generated by Pro Football Reference, again ordered by yards per punt. Here are the results in terms of correlations and 95% confidence intervals (all of the correlations below are significant at the level p < 0.001):

    Traditional passer rating differential & points differential: 0.67; [0.547 to 0.766]
    QBR differential & points differential: 0.58; [0.406 to 0.680]

    Traditional passer rating (offense) & points scored: 0.59; [0.445 to 0.704]
    QBR (offense) & points scored: 0.56; [0.406 to 0.680]

    Traditional passer rating (defense) & points allowed: 0.71; [0.599 to 0.797]
    QBR (defense) & points allowed: 0.63; [0.489 to 0.732]

    QBR differential & traditional passer rating differential: 0.60; [0.463 to 0.716]

    QBR (offense) & traditional passer rating (offense): 0.66; [0.526 to 0.754]
    QBR (defense) & traditional passer rating (defense): 0.73; [0.627 to 0.813]

    So as we know, QBR is not transparent. We don't know how the measure is calculated. However, if the measure is calculated in a reliable manner (note the emphasis on the word "if"), then the results above support its intended purpose of teasing the play of quarterbacks apart from that of the rest of their teams, in that all of the QBR correlations are weaker than the traditional passer rating correlations.

    What this means of course is that -- again if the measure is calculated reliably -- it's quite possible we have a more valid measure than traditional passer rating of the individual performance of the quarterback, independent of his team.

    Of course nothing can tease the performance of the quarterback apart from his team completely, but it is of course possible to design statistical measures that move us further that direction. QBR may in fact be one of those.
     
  12. cbrad

    cbrad . Club Member

    7,521
    8,912
    113
    Dec 21, 2014
    First of all, good job getting the data.

    However, your logic in this quote is wrong. Are you suggesting that if we create another black box method where the correlation to win% is lower than for QBR that it's an even better measure of QB ability? Maybe if we have a black box method where the correlation is the lowest possible correlation of zero you'd say it's a perfect measure of QB ability? Makes no sense obviously.

    Mathematically, this is what is going on. The more parameters you add in a model (the more measurable factors you add) the more noise you'll have from those measurements, and the more noise you have the worse your predictive power will be UNLESS you can overcome that noise with accurate enough mathematical relations in your model.

    In practice, for most physical models you reach a limit in your ability to improve predictive power with a single digit number of parameters in the model, or in some cases low double digits. Based on the description of ESPN's QBR and knowing they have 10,000 lines of code that most likely encode tons of conditional statements (e.g., how much credit to give to the QB in one particular situation, etc...) their model is probably swamped by the massive amount of extra noise.

    So if I had to guess, the reason for the lower correlations is due to bad modeling, which argues exactly against your conclusion.

    Of course, there's no way to tell whether the lower correlation is due to better modeling but less influence from the rest of the team or due to bad modeling because they don't publish their formula, but the most likely explanation is bad modeling because extracting QB ability from the rest of the team is such a difficult problem that the default assumption should be bad modeling.

    Also, the only way to really determine whether the correlation to win% is too high or too low is if you had a perfect measure of QB ability and found that measure's correlation to win%. We don't have that of course, so no way to use the correlations to win% to determine to what degree it's estimating individual ability vs. team ability. However, the lower correlations DO mean there's no reason to use QBR for prediction purposes, and that takes away arguably the only potential utility of a black box method.

    Summary: ESPN's QBR is best put in the intellectual trash can, not only because it's black box (which means unknown validity) but because it doesn't predict win% as well (meaning it's not even good for prediction purposes).
     
    Irishman, resnor and The Guy like this.
  13. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    Perhaps we should back up and give an overview of what QBR actually is. Here's an explanation:
    Now of course people are free to disagree, but that to me appears to be conceptually quite an appealing formulation. In terms of what isn't included in traditional passer rating, what we've done is 1) add a variable (EPA) that assigns a relative value to yards (i.e., all yards aren't created equal), 2) determine whether the QB was pressured, 3) account for air yards versus yards after the catch, 4) discount performance that's accrued when defenses are essentially allowing the accumulation of yardage ("garbage time"), 5) incorporate the quality of the opposing defense, and 6) add QB runs.

    Now if that isn't a laundry list of people's typical complaints about the validity of passer rating on a single-game basis, I don't know what is. How many times have you heard some combination of the following? "But he was pressured all day...his offensive line sucked...his receivers ran for tons of yards after the catch...he was playing against a ****ty defense...he got all his stats during garbage time..." And on and on, ad nauseam.

    Now of course it's quite possible the model is overfitted, and unfortunately we can't know that because it's proprietary. But aside from that I wouldn't base the validity of QBR entirely on it's predictive ability with regard to winning and toss it completely based on that. Instead I would explore its validity in relation to 1) its correlation with people's and/or experts' perceptions of quarterbacks' individual ability, and 2) its ability to predict quarterbacks' accomplishments independent of their teams (e.g., being a league all pro on a relatively bad team). We can do both of those things without knowing how to calculate it.

    https://en.wikipedia.org/wiki/Total_quarterback_rating
     
    Last edited: Feb 5, 2020
  14. cbrad

    cbrad . Club Member

    7,521
    8,912
    113
    Dec 21, 2014
    It's easy to say you've added X, Y, Z. The difficulty lies in accurate modeling of X, Y, Z. So someone saying they've added X, Y, Z is just a marketing job until they show HOW they did it.

    Also.. you can't determine validity through correlations to win% when you can't see HOW the parameters are related in the model. It's only when you can see those relations and your question is whether those relations actually predict something you care about that correlations to win% (or point differential etc..) become useful.

    And everything after EPA needs transparency. Too many possible ways to incorporate those parameters, almost all of which will be wrong.
     
    Irishman and resnor like this.
  15. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    What (if anything) do you make of the fact that the correlations for QBR/points scored and passer rating/points scored are so similar?
     
  16. cbrad

    cbrad . Club Member

    7,521
    8,912
    113
    Dec 21, 2014
    Nothing. It's a large confidence interval. And if it were a small confidence interval it would still be best described as a coincidence given that the other correlations aren't that similar.
     
    Irishman likes this.
  17. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    Here's a finding that bears on the above:

    From 2017 to 2019 (N = 100 individual QB seasons) the correlation between QBR and total clutch-weighted EPA is 0.898 (95% CI = 0.849 to 0.931).

    So QBR essentially is total clutch-weighted EPA, in that nearly 81% of its variance is associated with it.

    Other interesting findings:

    Over the same time span the correlation between QBR and clutch-weighted EPA on plays with pass attempts is 0.85 (95% CI = 0.781 to 0.898).

    QBR and clutch-weighted EPA through rushes: 0.31 [0.118 to 0.486]

    QBR and clutch-weighted EPA (lost) on sacks: 0.15 [-0.059 to 0.340]

    QBR and clutch-weighted EPA on penalties: 0.38 [0.190 to 0.541]

    QBR and traditional passer rating: 0.78 [0.679 to 0.846]

    Traditional passer rating and total clutch-weighted EPA: 0.696 [0.573 to 0.788]

    So it appears that while QBR essentially is total clutch-weighted EPA, traditional passer rating is not just total clutch-weighted EPA. Almost 52% of the variance in traditional passer rating is unexplained by total clutch-weighted EPA, while only 19% of QBR is unexplained by it.
     
  18. cbrad

    cbrad . Club Member

    7,521
    8,912
    113
    Dec 21, 2014
    Not surprising given that one of the main proprietary components of QBR is how they define clutch!!

    So yeah QBR is highly correlated with one key component of QBR lol.
     
    danmarino likes this.
  19. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    Right, but what this tells us is that the most straightforward "ingredient" in QBR (EPA) accounts for nearly all of its variance! That's highly significant. It means QBR isn't varying more largely as a function of its "black box" features.

    Also, we know at least the premise of the clutch-weighting based on the following:
    So while we still don't know the calculations involved of course, what we're finding out here is that QBR varies largely as a function of its objectively determined features and not as a function of its subjectively determined ones.

    https://www.espn.com/blog/statsinfo...-calculated-we-explain-our-quarterback-rating
     
  20. cbrad

    cbrad . Club Member

    7,521
    8,912
    113
    Dec 21, 2014
    ??? I just got through telling you that "clutch-weighted" EPA IS black box because they don't tell you how they define clutch. And how you define "clutch" is subjective. It doesn't matter if you're using win probability as a basis. The question is HOW are you using win probability?
     
    danmarino likes this.
  21. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    Right, but is that really something that should prevent us from exploring its validity in the ways I mentioned above, i.e., determining whether it correlates with people's and/or experts' appraisals of QBs' individual ability, and whether it correlates with QBs' accomplishments independent of their teams?

    I mean they've obviously determined a non-transparent win probability associated with down-weighting EPA during garbage time situations, but I suspect 1) those situations aren't all that prevalent in the league, given the parity among teams, and 2) the win probability they're using was likely determined on objective grounds and not arbitrarily.

    In the end you have likely a minuscule amount of variation in QBR determined by this particular "black box" component, and even that amount of variation is likely objectively determined and not arbitrary.
     
  22. cbrad

    cbrad . Club Member

    7,521
    8,912
    113
    Dec 21, 2014
    Absolutely. I'll say this again. EVERY valid statistical methodology is fully transparent. There is NO exception to that. We know the assumptions and calculations of every single valid method for statistical analysis.

    So if it's black box you can immediately reject it for that reason and that reason only.
     
  23. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    Right, but statistical validity and ecological validity are two different things. And in this case we don't even know QBR is statistically invalid, while we can make a case for its ecological validity regardless.
     
  24. cbrad

    cbrad . Club Member

    7,521
    8,912
    113
    Dec 21, 2014
    If it's black box you can't tell whether it's actually a valid measure of QB ability. That should be obvious. And the goal is to determine if it's a valid measure of QB ability, so "ecological" validity doesn't matter because there is no known "valid" measure of QB ability, not statistical nor based on human eyes. Validity has to be established by showing you are actually removing the effect of the team, and you can't establish that when you don't know what the mathematical relations are in the model.
     
  25. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    So if we had purely a measure of EPA for every quarterback, you'd feel comfortable proceeding with further exploration of its validity with regard to measuring quarterbacks' individual ability, but in the case of something (QBR) that down-weights EPA during garbage time and has 81% of its variance explained by those two variables alone, you aren't?
     
  26. cbrad

    cbrad . Club Member

    7,521
    8,912
    113
    Dec 21, 2014
    You don't "explore" EPA's validity. You know precisely what it measures already so you're done. You know that it's still technically a team measure. There's no way through "exploration" with correlations to figure out how valid EPA is as a measure of QB ability.
     
    Irishman likes this.
  27. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    What I mean is exploring its construct validity, convergent validity, discriminant validity, and criterion validity. All of that is possible without knowing how it's calculated, and in fact you could know how it's calculated and it could have none of those kinds of validity.

    We know the thing is essentially EPA with a down-weighting for garbage time. The question is, do those two variables when applied to quarterbacks achieve construct validity, convergent validity, discriminant validity, and criterion validity?
     
  28. cbrad

    cbrad . Club Member

    7,521
    8,912
    113
    Dec 21, 2014
    The only "validity" that matters is construct validity: the degree to which something measures what it claims.

    And once you say something is a measure of QB ability you cannot determine construct validity with a black box method. How QBR relates to anything else is completely beside the point if the goal is to suggest it may be a valid measure of QB ability (what you suggested).
     
    Irishman likes this.
  29. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    You must then believe there is no current valid measure of quarterbacks' individual ability?
     
  30. cbrad

    cbrad . Club Member

    7,521
    8,912
    113
    Dec 21, 2014
    That's correct. Every "QB" measure people have come up with is technically a team stat. However, at least with transparent stats like passer rating you can look at the formula and see where the confounds come from and what's left out of the formula.

    There are statistical techniques that allow you to estimate how much you can "average out" confounds with large sample size and also statistical techniques that allow you to estimate how much of an effect whatever is left out might have. So there's room for analysis of construct validity, but there's no such room with a black box method.
     
    resnor and Irishman like this.
  31. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    OK well that I can live with.

    I do think however that QBR is more appealing conceptually than traditional passer rating, and when we realize that 81% of its variance is accounted for by nothing more than EPA down-weighted by garbage time performance, I suspect it's more valid for measuring quarterbacks' individual ability as well.
     
  32. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    Some other correlations of note from the sample of season QBR ratings from 2017 to 2019:

    Adjusted (to 2019) traditional passer rating and win percentage: 0.648 [0.512 to 0.752]

    Adjusted traditional passer rating and QBR: 0.781 [0.687 to 0.850]

    QBR and win percentage: 0.634 [0.494 to 0.742]

    Clutch-weighted EPA and win percentage: 0.603 [0.455 to 0.718]
     
  33. Finatik

    Finatik Season Ticket Holder Club Member

    1,603
    1,548
    113
    May 2, 2014
    SO Cal
    This is so weird it's like you are arguing with yourself until I noticed that every other post was missing.
     
    resnor, Cashvillesent and Irishman like this.
  34. cbrad

    cbrad . Club Member

    7,521
    8,912
    113
    Dec 21, 2014
    Yeah you have to log off and come back to the site without logging on to see those posts lol.
     
    resnor and Irishman like this.
  35. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    And have fun with that. :tongue2:
     
  36. djphinfan

    djphinfan Season Ticket Holder Club Member

    96,655
    49,541
    113
    Dec 20, 2007
    I tried Guys... too smart for my blood..holy ****
     
    resnor, Irishman, cbrad and 1 other person like this.
  37. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    Some other correlations and 95% confidence intervals of note for the sample of season QBs from 2017 to 2019:

    DVOA and win percentage: 0.63 [0.48 to 0.74]

    DVOA and adjusted (to 2019) passer rating: 0.90 [0.85 to 0.93]

    DVOA and QBR: 0.85 [0.79 to 0.90]

    DVOA and clutch-weighted EPA: 0.76 [0.66 to 0.83]

    DVOA and clutch-weighted EPA on plays involving pass attempts: 0.83 [0.76 to 0.89]

    (Here is a description of DVOA: https://www.footballoutsiders.com/info/methods)

    So to sum this up, the correlations between quarterbacks' win percentage and 1) traditional passer rating, 2) QBR, and 3) DVOA are essentially interchangeable. Total clutch-weighted EPA, which is one of the variables involved in QBR, appears to be substantially more strongly related to QBR (0.90) than it is to either traditional passer rating (0.69) or DVOA (0.76). This of course is intuitive, but it's interesting nonetheless that these metrics appear to be measuring different things about quarterback play.
     
  38. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    Expounding on the value of statistics a bit, I met a 71-year-old man yesterday who believes he is terminal for prostate cancer and has five years to live because his prostate-specific antigen (PSA) level had increased from 0.08 to 0.16 in the past year, following his prostatectomy. This man was already depressed about losing his wife and his sister in the past year and was hoping to find, in his words, "at least a companion to spend the rest of my life with, but now that I don't have long to live, I won't be subjecting anybody to that kind of hurt."

    The information on the following wepbage -- all statistically derived -- would go a long way toward reassuring this man about his longevity, at least in relation to prostate cancer, and would hopefully encourage him go out and find the companion he wants:

    https://www.hopkinsmedicine.org/bra...t-should-i-do-if-my-psa-returns-after-surgery
     
  39. The Guy

    The Guy Well-Known Member

    1,731
    861
    113
    Oct 1, 2018
    Here's an interesting article that applies statistics to football:
    https://www.espn.com/nfl/story/_/id/26888038/pass-blocking-matters-more-pass-rushing-prove-it
     
  40. cbrad

    cbrad . Club Member

    7,521
    8,912
    113
    Dec 21, 2014
    That one's not so bad. So this means ESPN's football analytics aren't ALWAYS bad lol. They don't tell you precisely how they used player tracking data to determine if a rusher beat a blocker within 2.5 seconds, but it's believable you could do that (I personally would look at the locations of the OL, draw line segments between them to create an "OL wavefront" or something like that, and if someone crosses that you say he beat the blocker). So I think I'm OK with people using Pass Block Win Rate (PBWR) or Pass Rush Win Rate (PRWR).

    The interpretations are VERY tricky though. They go through some issues in that article but in general while I think the stats are OK you really have to think carefully about what caused what with OL metrics like that.
     
    The Guy likes this.

Share This Page