19 August, 2015

Offensive Stabilization Points through Time

Using my maximum likelihood technique for estimating stabilization points, I performed a moving calculation of the stabilization point using data from 1900 - 2014 from fangraphs.com. Each stabilization point is a six-year calculation, including the current and five previous years (so for example, 2014 incudes 2009-2014 data, 1965 includes 1959 - 1965 data, etc.). There's not a mathematical or baseball reason for this choice - through trial and error it just seemed to provide enough data for estimation that the overall trend was apparent, with a decent amount of smoothing. Data includes only batters from each year with at least 300 plate appearances, and splits years for the same player. Raw counts are used, not adjusted in any form. Pitchers are excluded. My data and code is posted on my github if you would like to run it for yourself.

The "stabilization point" I have defined as the point where split-half correlation is equal to 0.5, which is equivalently where the shrinkage amount is 50% . Both of these are equal to a variance parameter $M$ in the beta-binomial model I fit, where the distribution of events given a mean $\theta_i$ is binomial for player $i$ and the underlying distribution of the $\theta_i$ follows a beta distribution with mean $\mu$ and variance parameter $M$.

Historical Plots


Trends can be see clearly in plots. For example, here is a plot of the stabilization point for home run rate from 1900 - 2014.



The effect of the dead ball era is clearly evident. A large stabilization point indicates a small variance - and during the dead ball era, there was a small variance, because most players weren't hitting home runs! More recently, the stabilization point has risen to the highest level it's been since that era.

Note that the stabilization point is should not be confused with the mean. In fact, here's a plot of the estimated league mean home run rate over the same period.


While going through peaks and valleys, the home run rate has risen fairly continuously over time - and the recent rise in home run stabilization point actually corresponds to a decrease in the mean home run rate (though interestingly, the decrease in league mean home run rate since the end of the steroid era still puts the current mean home run rate above any other preceding era).

To give another example, the stabilization point for triple rate is the lowest its been since the dead ball era - even though the league mean triple rate has decreased fairly continuously over time.





Interestingly, the stabilization points for walk rate and on-base percentage are the highest they've ever been, with walk rate having a noticeably sharp increase in recent years - one theory is that this is due to a "moneyball" effect of teams focusing much more strongly on walk rate as opposed to other statistics - indeed, the stabilization point for batting average (shown later in the article) has dropped during the same period - perhaps indicative of being more tolerant of variation in batting average but less tolerant in variation of on-base percentage (of course, pitching has grown more dominant since the end of the steroid era, which is likely adding to the effect as well).



Meanwhile, the stabilization points for double rate and extra base hit (2B + 3B) have increased over time.



But while the extra base hit rate stabilization point has decreased from the mid-2000s, while the double rate stabilization point has remained roughly the same.

The hit-by-pitch rate follows the same pattern as third base percentage -  it increased after the dead ball era, peaking in the 1930s and 1950s - but has decreased since then, and despite a small recent increase, is at its lowest stabilization point since that era.


Meanwhile, the strikeout rate stabilization point decreased fairly consistently over time, before stabilizing approximately in the 1970s, with peaks in the 1980s and early 2000s.


What Drives the Stabilization Point?


As I've shown, the mean of the underlying distribution of talents does not seem to be strongly associated with the stabilization point - the variance of the underlying distribution of talent levels is the primary factor. There is an inverse relationship - a small stabilization point indicates that there is a large variance in talent levels for that particular statistic, and a large stabilization point indicates that there is a small variance in talent levels for that statistic.

To get a clearer view of the factors that are affecting the stabilization point, here's a plot of the stabilization point for batting average (using at-bats as the denominator) versus time.


Below is an animation showing the empirical distribution of batting average with the estimated underlying distribution of talent levels in dashed lines (since I'm estimating the distribution of true batting averages and not the distribution of observed batting averages, it's okay that the dashed line is narrower than the histogram). Notice that as time goes on, the distribution gets narrower (the variance is decreasing) - this is what's driving the increase in stabilization point over time.


The opposite effect can be seen in the single rate stabilization point - it has decreased (with peaks and valleys) over time


As the distribution of single rates has become more spread out.




Graphics of all the stabilization points, league mean talent levels, and animated estimated talent distributions can be found here.

Individual Years


I also selected a few years to compare individually - some for specific reasons, some just as a representative of a certain era.

  • 1910, in the middle of the dead ball era, and a year of particularly low offensive output. 
  • 1928, to represent the 1920s and the age of Babe Ruth.
  • 1937, to represent the 1930s. 
  • 1945, the end of the second world war. 
  • 1959, to represent the 1950s.
  • 1968, the year of the pitcher.
  • 1975, six years after they lowered the mound. 
  • 1987, before the steroid era and six years after the 1981 labor stoppage. 
  • 2001, the year Barry Bonds hit 73 home runs, in the middle of the steroid era. 
  • 2014, the modern era.

\begin{array}{| c | c | c | c | c | c | c | c | c | c | c |}\hline
\textrm{Year} & \textrm{1B} & \textrm{2B}& \textrm{3B}& \textrm{XBH}& \textrm{HR} &  \textrm{SO} & \textrm{BB} & \textrm{BA} & \textrm{OBP} & \textrm{HBP}  \\ \hline
1910 & 523.02  & 348.68 & 469.42  & 274.24 & 537.41 & 130.77 &   87.33 & 285.65 & 175.27 & 259.08 \\
1928 & 442.01 &  475.05 & 583.77 &  385.43 & 102.77  &  83.26 &  82.10 & 286.25 & 173.49 & 414.86 \\
1937 & 436.17 &  597.72 & 709.13 &  463.40 &  90.61 &  73.79 &  76.94 & 344.93 & 174.84 & 723.59\\
1945 &  456.71 &  710.05 & 543.09 &  474.47 &  98.81 &  67.99 &  79.83 & 424.54 & 180.14 & 699.53 \\
1959 & 351.40 & 1059.30 & 721.82 & 794.95 &  90.18 &  58.28 &  81.20 & 430.60 & 200.31 & 414.12\\
1968 & 333.52 &  867.61 & 691.18 &  700.24  & 94.46 &  55.18 &  93.24 & 476.94 & 265.04 & 448.36\\
1975 & 246.27 &  970.40 & 646.09 &  773.63 &  85.50 &  53.19 &  73.61 & 407.65 & 204.92 & 410.99 \\
1987 & 269.39 &  949.73 & 537.13 &  801.21 &  90.23 &  52.85 &  86.22 & 541.67 & 262.28 & 430.67 \\
2001 & 255.57 & 838.16 & 482.32  & 971.61 &  95.11 &  57.37 &  76.16  & 465.84 & 196.51 & 251.47\\
2014 & 222.16 & 1025.31 & 372.50 & 1006.30 & 124.52 &  49.73 & 105.59 & 465.92 & 295.79 & 297.41 \\  \hline
\end{array}


While generally following the fuller patterns shown in the plots, the effect of major baseball events such as the dead ball era, the second world war, the lowering of the mound, and the steroid era is evident.

Remember that a smaller stabilization point indicates a larger variance among talent levels - so looking at 1968 and 1975 to see the effect of lowering the mound, for example, the spread of single, triple, and home run rates increased while the spread of double and extra-base hit rates decreased (the extra base hit rate being largely driven by the double rate). Interestingly, the spread of strikeout rates remained roughly the same, but the spread of walk rates, hit by pitch rates, batting average, and on-base percentage all increased.

Overall, a fun way to look at how offensive statistics have changed over time. Let me know what you think in comments.

No comments:

Post a Comment