1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Statistical methods for football

Discussion in 'Miami Dolphins Forum' started by cbrad, Feb 4, 2020.

  1. Pauly

    Pauly Season Ticket Holder

    3,235
    3,200
    113
    Nov 29, 2007
    There is a very famous quote in mathematics from John von Neumann “With 4 parameters (i.e. assumptions) I can fit an elephant. With 5 I can make him wiggle his trunk”. Demonstrated here:
     
    Irishman likes this.
  2. The Guy

    The Guy Well-Known Member

    2,630
    941
    113
    Oct 1, 2018
    Another interesting finding here:

    The correlation between clutch-weighted EPA per play and QBR: 0.98 (95% CI 0.968 to 0.986).

    QBR is clutch-weighted EPA per play.

    The correlation between clutch-weighted EPA per play and quarterbacks' win percentage from 2017 to 2019 is 0.654 [0.520 to 0.757].

    That's a bit stronger than the correlation between adjusted passer rating and win percentage (0.648; 0.512 to 0.752) and the correlation between DVOA and win percentage (0.626; 0.484 to 0.736).

    So if you're comfortable with the clutch weighting of EPA, whose calculation isn't known, but is explained in this way:
    ...then you may find clutch-weighted EPA per play an appealing measure of quarterback play.

    Here is a description of EPA: https://www.advancedfootballanalyti...s-explained/expected-points-and-epa-explained

    https://www.espn.com/blog/statsinfo...-calculated-we-explain-our-quarterback-rating
     
  3. cbrad

    cbrad . Club Member

    7,730
    9,193
    113
    Dec 21, 2014
    What???

    The first quote is fine (though I haven't double-checked it), the second I don't for a moment believe. First of all, the correlation on a per play basis cannot be higher than the correlation after you average across plays. How is that mathematically possible? When you average things you reduce variance! So the correlation could be higher with the average but I see no way it could be lower.

    It also makes no sense conceptually. If what you're saying is true (which I'm sure it's not), then those 10k lines of code are basically doing nothing. That is, apportioning credit among players (this is ESPN's main selling point) is essentially not happening, nor is adjustment for opponent strength, nor is adjustment for difficulty. You're saying all that has no influence on QBR. It's just "clutch-weighted EPA".

    And btw.. 0.98 correlation is NEVER seen with football stats. That's so high it's almost like there's no random variation.

    No I don't believe you. You need to show us the data. How did you get clutch-weighted EPA per play and QBR per play? ESPN gives you both for a season (https://www.espn.co.uk/nfl/qbr/_/seasontype/2) but where did you get that per play? Showing people data per play is dangerous from ESPN's point of view because it makes it easier for people to reconstruct the formula.
     
    resnor, Pauly, Irishman and 1 other person like this.
  4. The Guy

    The Guy Well-Known Member

    2,630
    941
    113
    Oct 1, 2018
    I can understand the concern, but I don't think they averaged across plays. I think when they post EPA it's simply additive. Notice that Tannehill for example, who played in only 11 games, has an EPA of only 45.8 in 2019, whereas Patrick Mahomes, who played in 14 games, has an EPA of 97.3.

    Anyway the data are from this page:

    https://www.espn.com/nfl/qbr

    You can see at the top there where you can go back to past seasons. The data are from 2017 to 2019, regular season only, and EPA per play is simply the "EPA" column divided by the "PLAYS" column.

    Also, their lines of code and the adjustments you're talking about are even more meaningless, because the correlation between EPA per play and QBR "RAW" (the right-most column on the above page) is 0.99 (again from 2017 to 2019), and the fact that isn't 1.0 may have more to do with different numbers of decimal places than anything else.
     
    cbrad likes this.
  5. The Guy

    The Guy Well-Known Member

    2,630
    941
    113
    Oct 1, 2018
    Another interesting finding from those data:

    The correlation between quarterbacks' win percentage and "PAA" on the above webpages (which is defined by ESPN as "number of points contributed by the quarterback, accounting for QBR and how much he plays, above the level of an average quarterback") is 0.66, which is stronger than the correlation between win percentage and 1) adjusted (to 2019) passer rating, 2) DVOA, 3) QBR, and 4) EPA per play.

    The correlation between PAA and EPA per play? 0.97.

    I think we've cracked the code.
     
    cbrad likes this.
  6. cbrad

    cbrad . Club Member

    7,730
    9,193
    113
    Dec 21, 2014
    Well something had to be wrong. Your calculations are correct. That means that ESPN's 10k lines of code actually are doing NOTHING!!

    In other words, as you say, ESPN's QBR is nothing more than EPA per play. All the code they have for apportioning credit among players is being swamped by the extra noise from the parameters – exactly that "bad modeling" that I was talking about.

    This also means that they don't have any better way of teasing apart individual QB ability than any other "QB" stat. Actually, EPA is MORE of a team stat than passer rating because passer rating is a team offensive passing stat while EPA includes everything on offense.

    You cracked the code, not me. Good job!
     
    resnor, Irishman, Pauly and 1 other person like this.
  7. Disgustipate

    Disgustipate Season Ticket Holder Club Member

    27,786
    39,513
    113
    Nov 25, 2007
    This is a really interesting thread. I took an education-based research/statistics class last year and was able to play around a bit with SPSS, and the entire time I was doing the different labs I kind of wished I could have spent the time trying it out for football stat stuff.
     
    Irishman, Pauly, cbrad and 1 other person like this.
  8. The Guy

    The Guy Well-Known Member

    2,630
    941
    113
    Oct 1, 2018
    Download the Jamovi program online and you can do the same thing on your own time.
     
    Disgustipate likes this.
  9. The Guy

    The Guy Well-Known Member

    2,630
    941
    113
    Oct 1, 2018
    Yeah it’s essentially Brian Burke’s influence at ESPN. On his Advanced Football Analytics site, his two main variables were EPA and WPA, and now QBR is simply EPA down-weighted by WPA in garbage time.
     
  10. cbrad

    cbrad . Club Member

    7,730
    9,193
    113
    Dec 21, 2014
    Wiki says QBR was last modified in 2013. Maybe there have been minor modifications since then, but Burke joined ESPN in 2015 so I doubt he was the one who made the formula so reliant on EPA. This almost certainly predates him.
     
    The Guy likes this.
  11. Pauly

    Pauly Season Ticket Holder

    3,235
    3,200
    113
    Nov 29, 2007
    As someone who has built and used computer models it is comforting that 99% of ESPN’s model is essentially fluff and bubbles that are there purely as window dressing.

    The more complex a model is the more it tells you about the assumptions of the user. The more chance there is of a small initial error in the assumptions magnifying or minimizing the effect of one component.

    The good news is we can ignore ESPN’s QBR and just use EPA and WPA.
     
    Irishman and cbrad like this.
  12. Pauly

    Pauly Season Ticket Holder

    3,235
    3,200
    113
    Nov 29, 2007
    A more modern approach should be able to get you a better outcome than the NFL’s passer rating. The NFL’s passer rating was developed in the late 60s/early 70s and had to rely on slide rules and pen and paper calculations to develop. It had a much more basic set of stats to depend on (for example sacks weren’t tracked separately and treated as negative rushes on the stat sheet).

    What is surprising is that passer rating has proven to be so robust and so predictive given the changes in how the game is played over the last 50 years.

    The last time I calculated correlation of adjusted to common year passer rating made to win % was 2 years ago, and I only had coded the data for 10 years. I got 0.67 correlation of passer rating made to win% for 2006 tp 2016. (Here https://www.thephins.com/threads/building-a-winning-team.91138/
     
    Irishman and The Guy like this.
  13. The Guy

    The Guy Well-Known Member

    2,630
    941
    113
    Oct 1, 2018
    Again in terms of the variables on this page:

    https://www.espn.com/nfl/qbr

    ...the correlation between clutch-weighted EPA per play and QBR RAW is 0.99, which I suspect is actually 1.0 but is less because of differences in numbers of decimal places.

    The correlation between clutch-weighted EPA per play and QBR is 0.979 (95% CI = 0.968 to 0.986).

    So that means the transformation from "QBR RAW" to QBR -- which is where I suspect the adjustments beyond the clutch down-weighting for WPA are made -- is responsible for a change in the variance in QBR accounted for by only clutch-weighted EPA of a mere 2.2%.

    So whatever they're doing between "QBR RAW" and QBR isn't changing things hardly at all. You might as well just stick with clutch-weighted EPA per play.

    Now, the clutch-weighting itself (using WPA) may make a significant difference over and above EPA per play, and that's part of their model as well, but unfortunately we can't know that because of reasons we've gone over here.
     
    Last edited: Feb 9, 2020
  14. The Guy

    The Guy Well-Known Member

    2,630
    941
    113
    Oct 1, 2018
    Well keep in mind here that these statistics (ESPN's) consist of "plays on which the QB has a non-zero expected points contribution; includes most plays that are not handoffs." So that includes quarterback runs, sacks, and -- very importantly -- penalties, like pass interference for example.

    So the fact that we're getting a slightly stronger correlation with win percentage by including far more of the activity of quarterbacks on the field, while again down-weighting (using WPA) for garbage time statistics, I think makes QBR (or clutch-weighted EPA per play...) a more attractive statistic than traditional passer rating.

    I'm with you with regard to being surprised that we can't do any better with regard to predicting win percentage on the basis of QBR, but we may in fact be capturing more of what the quarterback is doing in relation to win percentage. And therefore it nonetheless may be a better measure than traditional passer rating of quarterbacks' individual performance/ability.

    EDIT: To support the above, consider that the correlations among clutch-weighted EPA per play, QBR RAW, QBR, and PAA are all 0.97 or more, whereas the correlations between those same variables and traditional (adjusted) passer rating are in the high 0.70s.

    So if what ESPN is doing is truly capturing more of what the quarterback is doing on the field, we're talking about somewhere in the neighborhood of 33% of variance accounted for by those methods. Again if that's validly measuring quarterback play, that's an awful lot of variance.
     
    Last edited: Feb 9, 2020
    Pauly likes this.
  15. cbrad

    cbrad . Club Member

    7,730
    9,193
    113
    Dec 21, 2014
    For reference these are the correlations to win% from 1966-2019:

    Correlation between passer rating and win%: 0.634
    Correlation between passer rating allowed and win%: -0.595
    Correlation between passer rating differential and win%: 0.7972

    And while the passing game changed dramatically since 1978, what's amazing is that these correlations are almost exactly the same if you only look at 1978-2019. So passer rating is definitely robust which is good.

    Something else those correlations show through the square of the correlation r^2 which tells you how much variance in one variable is explained by the other (see first post in this thread for an explanation): 0.634^2 = 40.2% and (-0.595)^2 = 35.4%, so offensive passer rating explains about 5% more of win% than defensive passer rating. It's just one of many stats that show that offense is statistically speaking slightly more important than defense in the NFL (not every game, just on average).

    Regarding how passer rating could be improved, there are obviously all kinds of parameters you could add, from sacks to air yards (instead of passing yards) to QB rushing (if you're interested in a measure of QB ability rather than strictly passing efficiency), etc... And any improvement should also automatically adjust for era so that (let's say) 100 is defined to be league average.

    But there is one "mathematical" flaw in passer rating that needs to be corrected before worrying about adding other parameters: the artificial ceiling passer rating has on how much any of its components (COMP%, Y/A, TD%, INT%) could influence it.

    Let me give an example:

    If you have 15 completions in 20 passing attempts with 200 yards, 3 TD's and 3 INT's, your passer rating is 106.25. Maybe to the surprise of some people, if you had those same stats but had 4 TD's instead of 3 TD's, your passer rating is STILL 106.25. You could keep increasing the number of TD's to 5, 6, 7.. etc.. and no matter how much you increase it (as long as TD + INT is less than total completions) your passer rating is still 106.25 lol.

    That occurs because the formula puts in an artificial ceiling so that you can't go higher than a 158.3 rating (a "perfect" rating). Same thing occurs with INT's. You could have 15 completions in 20 passing attempts with 200 yards, 3 TD's and X INT's where X > 3 (X could be 10 INT's) and you STILL have a 106.25 rating, which is totally absurd.

    The problem is "mathematical" because a linear relationship is assumed for the entire scale of possible passer ratings, meaning that each extra TD counts exactly the same as the previous one. Generally it's much easier to increase TD's from 0 to 1 than from 4 to 5 in a game (if for no other reason than fixed amount of time in a game). So a linear relationship shouldn't be assumed. The relationship should be sigmoidal. In other words, a simple improvement on passer rating would be to remove the ceiling restrictions, take the result without a ceiling and use a sigmoidal function to arrive at an improved formula. Almost guaranteed to very slightly improve correlations to win%.
     
    Irishman, Pauly and The Guy like this.
  16. cbrad

    cbrad . Club Member

    7,730
    9,193
    113
    Dec 21, 2014
    I think it's time we stopped talking about QBR and instead talk about EPA or clutch-weighted EPA. QBR can be dismissed if the entire purpose of it was to apportion credit among players and the code that does that (which is black box) actually does nothing lol.

    So the question should be how does clutch-weighted EPA compare to passer rating. EPA weights the importance of a play based on how it affects the probability of scoring points (most direct relation to win% for the offense) while passer rating doesn't. And as you point out EPA incorporates plays like running plays, sacks and penalties, which passer rating doesn't.

    The question however is: which is capturing "QB ability" more. Let me pose this question: if your goal is to measure the ability of a QB to complete passes in tight coverage, would you want to weight completion percentage by the probability of scoring points?

    I'd say no. Game condition, field position, and how many expected points you add shouldn't matter if you're interested in "ability". You should just look at completion percentage as a function of different levels of "coverage" (a measure of difficulty of the task).

    So I don't think weighting plays by the probability of scoring points is the best foundation for a better measure of QB ability. EPA is probably a good foundation for measuring how much the team relies on a QB to win, but that's a different question than QB ability.
     
    Irishman, Pauly and The Guy like this.
  17. The Guy

    The Guy Well-Known Member

    2,630
    941
    113
    Oct 1, 2018
    I see what you mean, and it's tricky, because what EPA does as well is attribute more "success" to a seven-yard pass for a first down on 3rd and 6 than to a nine-yard pass that doesn't result in a first down on 3rd and 10, even though the latter QB passed for two more yards than the former.

    So there is some degree of "difficulty level of the task" built into that kind of measurement, in that opposing defenses are naturally going to play to stop those sorts conversions, and others like them (e.g., red zone offense and stopping touchdowns).
     
  18. The Guy

    The Guy Well-Known Member

    2,630
    941
    113
    Oct 1, 2018
  19. Pauly

    Pauly Season Ticket Holder

    3,235
    3,200
    113
    Nov 29, 2007
    One point to note about the fivethirtyeight article is that they say that early round picks are overvalued according to Jimmy Johnson’s draft chart and later round picks undervalued.

    The counterpoint to that is that you can only have 11 players on the field at one time (and many positions you can only field 1 or 2 players at any one time). Having good depth with 3 late round picks who can perform at NFL average level at a position sounds nice in theory, but to create favorable matchups a team would be better have 1 pro-bowler, 1 average level backup and one scrub. Because of the limited resources you can put on the field a player who produces 10% more than average demands more than +10% additional resources (salary or draft position) than an average player.
     
    Last edited: Feb 10, 2020
    The Guy and Irishman like this.
  20. The Guy

    The Guy Well-Known Member

    2,630
    941
    113
    Oct 1, 2018
    Now that we've discovered QBR is nothing more than clutch-weighted EPA (CWEPA), I thought it would be interesting to revisit the correlations between CWEPA and traditional passer rating (PR) on an individual game basis in 2019. The correlations recently discussed above involving CWEPA were on a season-long basis, with data from 2017 to 2019.

    So again these are for 100 individual NFL games from 2019, selected at random:

    CWEPA and PR (offense): 0.66
    CWEPA and PR (defense): 0.73
    CWEPA differential and PR differential: 0.60

    CWEPA (offense) and points scored: 0.56
    PR (offense) and points scored: 0.59

    CWEPA (defense) and points allowed: 0.63
    PR (defense) and points allowed: 0.71

    CWEPA differential and points differential: 0.58
    PR differential and points differential: 0.67

    So the first thing of note in my opinion is that CWEPA and passer rating are obviously measuring something different. CWEPA accounts for only 44% of the variance in passer rating offensively, and only 53% of the variance in passer rating defensively.

    I'd be interested to hear others' observations as well.
     
  21. The Guy

    The Guy Well-Known Member

    2,630
    941
    113
    Oct 1, 2018
  22. The Guy

    The Guy Well-Known Member

    2,630
    941
    113
    Oct 1, 2018
  23. cbrad

    cbrad . Club Member

    7,730
    9,193
    113
    Dec 21, 2014
    Yeah so first of all that's a good research paper in the sense that they're using a reproducible approach for trying to define Wins Above Replacement in the NFL. And providing the R package nflscrapR for others to download play-by-play data for themselves is good.

    That research group by Horowitz is one of the few that actually takes statistical analysis in the NFL seriously.

    Having said that, the nflscrapR package was developed long time ago and till this day they haven't uploaded that to CRAN (Comprehensive R Archive Network) which is where you'd upload R packages if you want to do it professionally. CRAN doesn't just allow you to upload anything. It goes through a review process (just like publishing a paper) and you have to make sure everything is backwards compatible and works on all platforms (e.g., Windows, Mac, Linux, etc..) and also that there are no identifiable errors in it, etc...

    I published an R package on CRAN so I've been through that. They haven't yet with nflscrapR and it shows because when I tried installing it, it got hung up on another R package nflscrapR depends on called rlang. In other words there's some dependency in there that prevents me from installing the whole thing. Regardless, the 2009-2018 play-by-play database I have comes from that group (that is, they compiled the database and put it on Kaggle for others to download).

    So that's a bit disappointing.

    As far as the methodology, it's transparent and uses traditional approaches such as hierarchical linear models (multilevel models) so that's good, but in no way do I think it solves the problem of division of credit. For example, there is no way for them to estimate interaction effects and there are obviously interaction effects among players.

    But yes for a research paper it's good because it's the antithesis of proprietary approaches like ESPN's QBR (or clutch-weighted EPA) or FO's DVOA. So it's a good starting point, but not the solution.
     
    The Guy and Irishman like this.
  24. The Guy

    The Guy Well-Known Member

    2,630
    941
    113
    Oct 1, 2018
    OK thanks. So what do you think about the validity of the WAR constuct as outlined and calculated in that paper, if we were to use it to compare players? Not so great I take it.
     
  25. cbrad

    cbrad . Club Member

    7,730
    9,193
    113
    Dec 21, 2014
    I think it's fine to quote it as an example of where statistics can currently take you (and also its limitations), but I wouldn't assign it any greater credibility than a stat like passer rating or so. It would be interesting for discussion purposes but it's not gospel.
     
    The Guy likes this.
  26. The Guy

    The Guy Well-Known Member

    2,630
    941
    113
    Oct 1, 2018
    Check this out:

     
  27. cbrad

    cbrad . Club Member

    7,730
    9,193
    113
    Dec 21, 2014
    That was bound to happen: someone using machine learning on real-time data from football. The issue of course is that the same machine learning algorithm will likely give you very different changes in expected points or win probability for the exact same play after it is trained on more data. So those graphs don't remain the same (for the exact same play) over time.

    So on the face of it these things aren't reliable. However, if these guys can take the next step and show that machine learning makes more accurate predictions than any other method, then they have something.
     
    Surfs Up 99 and The Guy like this.
  28. The Guy

    The Guy Well-Known Member

    2,630
    941
    113
    Oct 1, 2018
    Here's the article for it:

    https://arxiv.org/pdf/1906.01760.pdf
     

Share This Page