1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Statistical methods for football

Discussion in 'Miami Dolphins Forum' started by cbrad, Feb 4, 2020.

  1. cbrad

    cbrad . Club Member

    8,222
    9,723
    113
    Dec 21, 2014
    Yeah.. if they're using EPA, then the first two games last season where we lost 10-59 and 0-43 are going to skew the expected wins quite a bit because EPA doesn't care that all the drives in those games (both on offense and defense) came in just 2 games lol. So EPA would predict a lower win% than we actually had because of that alone.

    Now whether that explains the full 2.5 wins above expected I don't know, but I wouldn't be surprised if it explained a good portion of it. Something to keep in mind when merging data.
     
    Pauly, Irishman and The Guy like this.
  2. The Guy

    The Guy Season Ticket Holder Club Member

    3,893
    1,214
    113
    Oct 1, 2018
    Would that be controlled for if they used EPA per play instead of just EPA?
     
  3. cbrad

    cbrad . Club Member

    8,222
    9,723
    113
    Dec 21, 2014
    No, in fact it looks like they used EPA per dropback and per rush. If you have one or two absolutely horrid games, with tons of plays and bad average EPA per play, that obviously skews the average across a season.

    The way to "control" for that is to use a sampling distribution — a distribution of means — where in this case the mean is per game. In other words, use average EPA per game as a single statistic and look at the distribution of that. Problem there is you have so few data points with 16 games so there are drawbacks to that approach too, but it solves this one issue.
     
    Irishman and The Guy like this.
  4. The Guy

    The Guy Season Ticket Holder Club Member

    3,893
    1,214
    113
    Oct 1, 2018
    What do you think about the fact that in that article, when they use a random forest as opposed to regression, just about every variable pales in comparison in importance to offensive and defensive EPA per dropback?
     
  5. cbrad

    cbrad . Club Member

    8,222
    9,723
    113
    Dec 21, 2014
    That first figure uses simple linear regression, so they did regression separately on each variable (how well you can predict wins based on each variable alone). That second graph they used a random forest model which takes into account all the possible predictors, so those two graphs aren't comparable.

    The proper comparison would be multiple linear regression vs. random forest. In general, you don't want to trust machine learning techniques like random forests because they tend to "overfit" the data, i.e., they work well on that one dataset but not on a slightly different one. Multiple linear regression is better, however I suspect EPA per dropback would also win out in multiple linear regression because we know passing efficiency matters more than rushing efficiency.
     
    The Guy likes this.
  6. The Guy

    The Guy Season Ticket Holder Club Member

    3,893
    1,214
    113
    Oct 1, 2018
     
    cbrad likes this.
  7. cbrad

    cbrad . Club Member

    8,222
    9,723
    113
    Dec 21, 2014
    Yeah, whatever the effect of fans on home field advantage is being swamped by the effect of no fans on offensive production for both teams. Points per game, yards per game, passer rating, plays per drive, scoring percentage per drive.. all of these are at record levels this year with QB's having to whisper in the huddle lol.

    Two links worth checking every week:
    https://www.pro-football-reference.com/years/NFL/index.htm
    https://www.pro-football-reference.com/years/NFL/passing.htm

    We can also compare those stats at the end of the year to what they were a week ago (i.e., after 1 month of play), especially ppg at 25.6 and passer rating at 96.5, to see how much of the observed effect was due to lack of preseason. Right now, it's looking like most of the effect is stable and due to lack of crowd noise, but passer rating did decrease last week, taking the overall average to 95.6, possibly showing some effect of lack of preseason.
     
    The Guy likes this.
  8. The Guy

    The Guy Season Ticket Holder Club Member

    3,893
    1,214
    113
    Oct 1, 2018
    I have to wonder also if there is an effect of the home field crowd on inspiring defensive play. So you have perhaps this synergistic effect where offenses can function better and defenses are less inspired and invigorated.
     
  9. The Guy

    The Guy Season Ticket Holder Club Member

    3,893
    1,214
    113
    Oct 1, 2018
     
  10. The Guy

    The Guy Season Ticket Holder Club Member

    3,893
    1,214
    113
    Oct 1, 2018
     
  11. The Guy

    The Guy Season Ticket Holder Club Member

    3,893
    1,214
    113
    Oct 1, 2018
     
  12. The Guy

    The Guy Season Ticket Holder Club Member

    3,893
    1,214
    113
    Oct 1, 2018
     
  13. The Guy

    The Guy Season Ticket Holder Club Member

    3,893
    1,214
    113
    Oct 1, 2018
     
  14. The Guy

    The Guy Season Ticket Holder Club Member

    3,893
    1,214
    113
    Oct 1, 2018
     
  15. The Guy

    The Guy Season Ticket Holder Club Member

    3,893
    1,214
    113
    Oct 1, 2018
     
  16. The Guy

    The Guy Season Ticket Holder Club Member

    3,893
    1,214
    113
    Oct 1, 2018
  17. cbrad

    cbrad . Club Member

    8,222
    9,723
    113
    Dec 21, 2014
    Yeah nothing special there, basically fitting a curve to data on how well spreads predicted actual score differentials, and using that to improve on a model he has. The "weighting" is just adding this new information to the old model. Basically, he's adding more parameters to increase predictive power.

    Like I said, nothing special from a modeling perspective, but it's exactly what you'd do if you really wanted to bet on games.
     
    The Guy likes this.
  18. The Guy

    The Guy Season Ticket Holder Club Member

    3,893
    1,214
    113
    Oct 1, 2018
    Right but am I reading that correctly that he's getting 65% accuracy against the spread with that model?
     
  19. cbrad

    cbrad . Club Member

    8,222
    9,723
    113
    Dec 21, 2014
    No that's just the market regression weight lol (look at the x-axis on that graph).
     
    The Guy likes this.
  20. danmarino

    danmarino Justice for Jacob Blake Club Member

    10,524
    11,582
    113
    Sep 4, 2014
    Do we know if there may be some bad teams vs good teams causing this? Maybe it just so happens that bad home teams are playing good away teams more often? Just a thought...
     
    Irishman likes this.
  21. cbrad

    cbrad . Club Member

    8,222
    9,723
    113
    Dec 21, 2014
    No such effect. First, all teams have played either 4 or 5 home games, and if bad home teams were playing good away teams more often the red line in the graph below would have a higher slope and lower intercept. The red line shows the best fitting line to home win% as a function of overall win%, and the slope of that line is 0.9461 and intercept 0.0459, so nearly identical to the dotted line = slope of 1 and 0 intercept = no effect.

    [​IMG]

    It's the lack of crowd noise. Offensive stats are at record levels. League average PPG is at 25.1 (highest ever previously was 23.3) and average passer rating is at 94.5 (highest previously was 92.9). Commentators have pointed out that QB's have to whisper in the huddle so that the defense can't hear them, etc.
     
    The Guy and danmarino like this.
  22. The Guy

    The Guy Season Ticket Holder Club Member

    3,893
    1,214
    113
    Oct 1, 2018
     
  23. cbrad

    cbrad . Club Member

    8,222
    9,723
    113
    Dec 21, 2014
    This Moo guy is "not a fan of passer rating" yet he doesn't understand that passer rating inflation is so huge over time that you have to equate distributions to compare passer rating across eras, i.e., look at z-scores.

    In any case, Mahomes has 1473 attempts right now, and I've generally put the threshold for comparing careers at 4000, so he's a long ways off from being included on any career list. But as of now his career z-score is 1.541, and that would put him 3rd all time behind Steve Young (1.8627) and Joe Montana (1.5602). Just goes to show how impressive Young was.

    Remember though, Young and Montana had 4000+ attempts. Let's see if Mahomes can keep up this torrid pace. For reference, Wilson (just passed 4k attempts) is 1.1953, Rodgers (6k+ attempts) is 1.3165, and Brees (10k+ attempts) is 1.1613.
     
    Last edited: Nov 26, 2020 at 2:04 PM
    The Guy likes this.

Share This Page