Issue Observed
different results of std in matlab and numpy
TLDR: Numpy
ddof = 0, divide N, biased estimator
ddof = 1, divide N-1, unbiased estimator
Bessel’s Correction
Degree of freedom
Number of values in the final calculation of a statistic that are free to vary
e.g in sample variance, the degree of freedom is n-1
cuz the sample mean is fixed, so the last value is fixed by the rest
The higher the degree of freedom, sample variance is more reliable
Notation
s2: Unbiased sample variance
sn2: Biased sample variance
μ:population mean
x: sample mean
Biased Reason
Crux: xˉ and μ bias
ez prove
i=1∑n(xi−xˉ)2≤i=1∑n(xi−μ)2
Unbiased Estimator Proof
E(xˉ)=μ
E(sn2)=E(n1i=1∑n(xi−xˉ)2)=E(n1i=1∑n((xi−μ)−(xˉ−μ)2))=E(n1i=1∑n((xi−μ)2−2(xi−μ)(xˉ−μ)+(xˉ−μ)2))=E(n1i=1∑n(xi−μ)2+n1i=1∑n(−2xi+2μ+xˉ−μ)(xˉ−μ))=E(n1i=1∑n(xi−μ)2+n(xˉ−μ)i=1∑n(−2xi+μ+xˉ))=E(n1i=1∑n(xi−μ)2−(xˉ−μ)2)=E(s2)−E((xˉ−μ)2)=σ2−Var(xˉ)=σ2−Var(n1i=1∑nxi)=σ2−n21i=1∑nVar(xi)=σ2−n21nσ2=nn−1σ2=nn−1E(s2)
E(s2)=n−1nE(sn2)
sn2=n1i=1∑n(xi−xˉ)2
s2=n−11i=1∑n(xi−xˉ)2