一、Scipy
概述 SciPy
基于NumPy
提供了更为丰富和高级的功能扩展,在统计、优化、插值、数值积分、时频转换等方面提供了大量的可用函数,基本覆盖了基础科学计算相关的问题。
1
2
3
import numpy as np
import scipy.stats as stats
import scipy.optimize as opt
二、统计部分 2.1 生成随机数
rv_continuous.rvs(size=n)
: 其中rv_continuous
表示连续型的随机分布,如均匀分布(uniform
)、正态分布(norm
)、贝塔分布(beta
)等
rv_discrete.rvs(size=n)
: rv_discrete
表示离散型的随机分布,如伯努利分布(bernoulli
)、几何分布(geom
)、泊松分布(poisson
)等。
见:http://docs.scipy.org/doc/scipy/reference/stats.html
1
2
r_unif = stats.uniform.rvs( size = 10 )
print(r_unif)
[ 0.2924269 0.29822465 0.48178216 0.22004046 0.90243864 0.37951304
0.63613836 0.64975187 0.40374609 0.31864229]
1
2
rv_beta = stats.beta.rvs(size = 10 , a = 4 , b =2 )
print(rv_beta)
[ 0.7330703 0.79941641 0.84296951 0.96649249 0.58666597 0.90552661
0.81672749 0.62606183 0.53123816 0.33406139]
1
2
beta = stats.beta(a = 4 , b = 2 )
beta.rvs(size = 10 )
array([ 0.54825343, 0.48858131, 0.78255051, 0.66667343, 0.59489605,
0.74490324, 0.72473494, 0.54107229, 0.84224194, 0.64791601])
2.2 假设检验 1
2
3
4
norm_dist = stats.norm(loc = 0.5 , scale = 2 )
n = 200
dat = norm_dist.rvs(size = n)
np.mean(dat)
0.40692986915948354
0.36653201314410355
1.8750494924213752
使用K-S检验( Kolmogorov-Smirnov test)检验正态分布性,使用kstest
1
2
3
mu = np.mean(dat)
sigma = np.std(dat)
stat_val, p_val = stats.kstest(dat, 'norm' , (mu, sigma))
1
print('KS-statistic D = %6.3f p-value = %6.4f' % (stat_val, p_val))
KS-statistic D = 0.047 p-value = 0.7663
t-test检验均值是否为0
1
2
stat_val, p_val = stats.ttest_1samp(dat, 0 )
print('One-sample t-statistic D = %6.3f p-value = %6.4f' % (stat_val, p_val))
One-sample t-statistic D = 3.061 p-value = 0.0025
拒绝原假设: 数据的均值为0
双样本t-test
(ttest_ind
)
1
2
3
4
norm_dist2 = stats.norm(loc=-0.2 , scale=1.2 )
dat2 = norm_dist2.rvs(size=n/2 )
stat_val, p_val = stats.ttest_ind(dat, dat2, equal_var=False )
print('Two-sample t-statistic D = %6.3f, p-value = %6.4f' % (stat_val, p_val))
Two-sample t-statistic D = 3.151, p-value = 0.0018
/Users/shihchosen/anaconda/lib/python3.5/site-packages/scipy/stats/_continuous_distns.py:128: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
return self._random_state.standard_normal(self._size)
/Users/shihchosen/anaconda/lib/python3.5/site-packages/numpy/core/fromnumeric.py:225: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
return reshape(newshape, order=order)
2.3 其他函数
1
2
g_dist = stats.gamma(a = 2 )
print(g_dist.cdf([2 ,4 ,5 ]))
[ 0.59399415 0.90842181 0.95957232]
1
print(g_dist.pdf([0.25 ,.5 ,.95 ]))
[ 0.1947002 0.30326533 0.36740397]
对于一个给定的分布,可以用moment
很方便的查看分布的矩信息,例如我们查看N(0,1)
的六阶原点矩
1
stats.norm.moment(6 , loc = 0 , scale = 1 )
15.000000000895332
describe
函数提供对数据集的统计描述分析,包括数据样本大小,极值,均值,方差,偏度和峰度
1
2
3
norm_dist = stats.norm(loc=0 , scale=1.8 )
dat = norm_dist.rvs(size = 100 )
info = stats.describe(dat)
1
2
3
4
5
6
7
print("Data size is: " + str(info[0 ]))
print("Minimum value is: " + str(info[1 ][0 ]))
print("Maximum value is: " + str(info[1 ][1 ]))
print("Arithmetic mean is: " + str(info[2 ]))
print("Unbiased variance is: " + str(info[3 ]))
print("Biased skewness is: " + str(info[4 ]))
print("Biased kurtosis is: " + str(info[5 ]))
Data size is: 100
Minimum value is: -4.28963191875
Maximum value is: 3.39969135101
Arithmetic mean is: -0.0583104535814
Unbiased variance is: 2.68846427562
Biased skewness is: 0.04732288942546752
Biased kurtosis is: -0.379124958678446
MLE
1
2
3
4
5
norm_dist = stats.norm(loc=0 , scale=1.8 )
dat = norm_dist.rvs(size=100 )
mu, sigma = stats.norm.fit(dat)
print("MLE of data mean:" + str(mu))
print("MLE of data standard deviation:" + str(sigma))
MLE of data mean:-0.0461510505165
MLE of data standard deviation:1.72294901553
1
2
3
4
5
6
7
8
norm_dist = stats.norm()
dat1 = norm_dist.rvs(size=100 )
exp_dist = stats.expon()
dat2 = exp_dist.rvs(size=100 )
cor, pval = stats.pearsonr(dat1, dat2)
print("Pearson correlation coefficient: " + str(cor))
cor, pval = stats.pearsonr(dat1, dat2)
print("Spearman's rank correlation coefficient: " + str(cor))
Pearson correlation coefficient: -0.0299016363187
Spearman's rank correlation coefficient: -0.0299016363187
线性回归
1
2
3
4
5
6
x = stats.chi2.rvs(3 , size=50 )
y = 2.5 + 1.2 * x + stats.norm.rvs(size=50 , loc=0 , scale=1.5 )
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
print("Slope of fitted model is:" , slope)
print ("Intercept of fitted model is:" , intercept)
print("R-squared:" , r_value**2 )
Slope of fitted model is: 1.15543693896
Intercept of fitted model is: 2.75126872589
R-squared: 0.770651967105
LinregressResult(slope=1.1554369389622545, intercept=2.7512687258945392, rvalue=0.87786785287120206, pvalue=5.8204930077311082e-17, stderr=0.090979593726512847)
三、优化部分 3.1 无约束优化问题 下面的就是无约束优化, $$ \mathrm{min}\quad f(x) = x^2 - 4.8x + 1.2 $$
下面的是约束优化: $$ \mathrm{min}\quad f(x) = x^2 - 4.8x + 1.2\ s.t.\quad x \geq 0 $$
考虑Rosenbrock 函数: $$ f(\mathrm{x}) = \sum_{i =1}^{N-1} 100(x_i - xi^2)^2 + (1 - x {i-1})^2 $$
1
2
3
def rosen (x) :
"""The Rosenbrock function"""
return sum(100.0 *(x[1 :]-x[:-1 ]**2.0 )**2.0 + (1 -x[:-1 ])**2.0 )
3.1.1 Nelder-Mead单纯形法 1
x_0 = np.array([0.5 , 1.6 , 1.1 , 0.8 , 1.2 ])
1
2
res = opt.minimize(rosen, x_0, method='nelder-mead' , options={'xtol' : 1e-8 , 'disp' : True })
print(res)
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 436
Function evaluations: 706
final_simplex: (array([[ 1. , 1. , 1. , 1. , 1. ],
[ 1. , 1. , 1. , 1. , 1. ],
[ 1. , 1. , 1. , 1. , 0.99999999],
[ 1. , 1. , 1. , 1. , 1. ],
[ 1. , 1. , 1. , 1. , 1.00000001],
[ 1. , 1. , 1. , 1. , 1.00000001]]), array([ 1.66149699e-17, 6.32117429e-17, 7.44105349e-17,
8.24396866e-17, 9.53208876e-17, 1.07882961e-16]))
fun: 1.6614969876635003e-17
message: 'Optimization terminated successfully.'
nfev: 706
nit: 436
status: 0
success: True
x: array([ 1., 1., 1., 1., 1.])
1
2
res = opt.minimize(rosen, x_0, method='powell' , options={'xtol' : 1e-8 , 'disp' : True })
print(res)
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 24
Function evaluations: 1924
direc: array([[ 4.55396239e-04, 1.36370090e-03, 2.24582768e-03,
4.12213272e-03, 7.99477493e-03],
[ -1.91529419e-03, -3.00633609e-03, -6.77652553e-03,
-1.34919171e-02, -2.67196797e-02],
[ -3.76393598e-02, -2.30587398e-02, 1.05038412e-02,
3.43628871e-05, 7.36675942e-05],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
1.00000000e+00, 0.00000000e+00],
[ 4.00791294e-06, 1.15417958e-05, 2.01270425e-05,
5.08729704e-05, 1.09096801e-04]])
fun: 2.041828615178145e-21
message: 'Optimization terminated successfully.'
nfev: 1924
nit: 24
status: 0
success: True
x: array([ 1., 1., 1., 1., 1.])