Odds
Let's say that instead of a batting average θ, I want the odds of getting a hit. To get the odds of a hit, apply the function
g(θ)=θ1−θ
So for example, a batter with a true θ=0.250 will have odds g(0.250)=0.250/0.750=1/3, or one to three odds of getting a hit.
Delta Method
Suppose we have some estimator ˆθ that converges to a normal distribution with variance σ2 - that is,
ˆθ→N(θ,σ2)
For example, assuming independent and identical at-bats, the sample batting average converges to a normal distribution
ˆθ→N(θ,θ(1−θ)n)
then statistical theory says that any function g(ˆθ), assuming the first derivative exists and is nonzero, has distribution
g(ˆθ)→N(g(θ),[g′(θ)]2σ2)
This gives us a handy way to calculate confidence intervals for functions of parameters, if we can calculate a confidence interval for the parameter itself.
Back to Odds of Getting a Hit
If we define the odds function as above, then the first derivative is given by
g′(θ)=1(1−θ)2
and so the distribution of the sample batting odds g(ˆθ) converges to a normal distribution with mean g(θ) and variance
[g′(θ)]2σ2=[1(1−θ)2]2[θ(1−θ)n]=θn(1−θ)3
And so a confidence interval for the odds of a hit, given the sample batting average ˆθ, is given by
(ˆθ1−ˆθ)±z∗√ˆθn(1−ˆθ)3
where z∗ is an appropriate quantile from the normal distribution.
Let's take our batter above, and suppose a ˆθ=0.250 batting average in n=40 at-bats. Then a 95% confidence interval for the odds of getting a hit is given by
(0.2501−0.250)±1.96√0.25040(1−0.250)3=(0.095,0.572)
A fairly wide interval - but then again, n=40 at-bats isn't much information to work on. If it were instead n=400 at-bats, the interval would be
(0.2501−0.250)±1.96√0.250400(1−0.250)3=(0.258,0.409)
Which is much smaller, and much more useable.
The code I used to generate these results may be found on my github.
No comments:
Post a Comment