科学计算之 `scipy`

一、`Scipy`概述

SciPy基于NumPy提供了更为丰富和高级的功能扩展，在统计、优化、插值、数值积分、时频转换等方面提供了大量的可用函数，基本覆盖了基础科学计算相关的问题。

1
2
3

import numpy as np
import scipy.stats as stats
import scipy.optimize as opt

二、统计部分

2.1 生成随机数

rv_continuous.rvs(size=n): 其中rv_continuous表示连续型的随机分布，如均匀分布（uniform）、正态分布（norm）、贝塔分布（beta）等
rv_discrete.rvs(size=n): rv_discrete表示离散型的随机分布，如伯努利分布（bernoulli）、几何分布（geom）、泊松分布（poisson）等。

见:http://docs.scipy.org/doc/scipy/reference/stats.html

1 2	r_unif = stats.uniform.rvs( size = 10) print(r_unif)

[ 0.2924269   0.29822465  0.48178216  0.22004046  0.90243864  0.37951304
  0.63613836  0.64975187  0.40374609  0.31864229]

1 2	rv_beta = stats.beta.rvs(size = 10, a = 4, b =2) print(rv_beta)

[ 0.7330703   0.79941641  0.84296951  0.96649249  0.58666597  0.90552661
  0.81672749  0.62606183  0.53123816  0.33406139]

1 2	beta = stats.beta(a = 4, b = 2) beta.rvs(size = 10)

array([ 0.54825343,  0.48858131,  0.78255051,  0.66667343,  0.59489605,
        0.74490324,  0.72473494,  0.54107229,  0.84224194,  0.64791601])

2.2 假设检验

norm_dist = stats.norm(loc = 0.5, scale = 2) # loc mean， scale 
n = 200
dat = norm_dist.rvs(size = n)
np.mean(dat)

0.40692986915948354

1	np.median(dat)

0.36653201314410355

1	np.std(dat)

1.8750494924213752

使用K-S检验（ Kolmogorov-Smirnov test）检验正态分布性,使用kstest

1
2
3

mu = np.mean(dat)
sigma = np.std(dat)
stat_val, p_val = stats.kstest(dat, 'norm', (mu, sigma))

1	print('KS-statistic D = %6.3f p-value = %6.4f' % (stat_val, p_val))

KS-statistic D =  0.047 p-value = 0.7663

t-test检验均值是否为0

1 2	stat_val, p_val = stats.ttest_1samp(dat, 0) print('One-sample t-statistic D = %6.3f p-value = %6.4f' % (stat_val, p_val))

One-sample t-statistic D =  3.061 p-value = 0.0025

拒绝原假设: 数据的均值为0

双样本t-test(ttest_ind)

norm_dist2 = stats.norm(loc=-0.2, scale=1.2)
dat2 = norm_dist2.rvs(size=n/2)
stat_val, p_val = stats.ttest_ind(dat, dat2, equal_var=False)
print('Two-sample t-statistic D = %6.3f, p-value = %6.4f' % (stat_val, p_val))

Two-sample t-statistic D =  3.151, p-value = 0.0018


/Users/shihchosen/anaconda/lib/python3.5/site-packages/scipy/stats/_continuous_distns.py:128: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  return self._random_state.standard_normal(self._size)
/Users/shihchosen/anaconda/lib/python3.5/site-packages/numpy/core/fromnumeric.py:225: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  return reshape(newshape, order=order)

2.3 其他函数

cdf

1 2	g_dist = stats.gamma(a = 2) print(g_dist.cdf([2,4,5]))

[ 0.59399415  0.90842181  0.95957232]

1	print(g_dist.pdf([0.25,.5,.95]))

[ 0.1947002   0.30326533  0.36740397]

对于一个给定的分布，可以用moment很方便的查看分布的矩信息，例如我们查看N(0,1)的六阶原点矩

1	stats.norm.moment(6, loc = 0, scale = 1)

15.000000000895332

describe函数提供对数据集的统计描述分析，包括数据样本大小，极值，均值，方差，偏度和峰度

1
2
3

norm_dist = stats.norm(loc=0, scale=1.8)
dat = norm_dist.rvs(size = 100)
info = stats.describe(dat)

print("Data size is: " + str(info[0]))
print("Minimum value is: " + str(info[1][0]))
print("Maximum value is: " + str(info[1][1]))
print("Arithmetic mean is: " + str(info[2]))
print("Unbiased variance is: " + str(info[3]))
print("Biased skewness is: " + str(info[4]))
print("Biased kurtosis is: " + str(info[5]))

Data size is: 100
Minimum value is: -4.28963191875
Maximum value is: 3.39969135101
Arithmetic mean is: -0.0583104535814
Unbiased variance is: 2.68846427562
Biased skewness is: 0.04732288942546752
Biased kurtosis is: -0.379124958678446

MLE

norm_dist = stats.norm(loc=0, scale=1.8)
dat = norm_dist.rvs(size=100)
mu, sigma = stats.norm.fit(dat)
print("MLE of data mean:" + str(mu))
print("MLE of data standard deviation:" + str(sigma))

MLE of data mean:-0.0461510505165
MLE of data standard deviation:1.72294901553

norm_dist = stats.norm()
dat1 = norm_dist.rvs(size=100)
exp_dist = stats.expon()
dat2 = exp_dist.rvs(size=100)
cor, pval = stats.pearsonr(dat1, dat2)
print("Pearson correlation coefficient: " + str(cor))
cor, pval = stats.pearsonr(dat1, dat2)
print("Spearman's rank correlation coefficient: " + str(cor))

Pearson correlation coefficient: -0.0299016363187
Spearman's rank correlation coefficient: -0.0299016363187

线性回归

x = stats.chi2.rvs(3, size=50)
y = 2.5 + 1.2 * x + stats.norm.rvs(size=50, loc=0, scale=1.5)
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
print("Slope of fitted model is:" , slope)
print ("Intercept of fitted model is:", intercept)
print("R-squared:", r_value**2)

Slope of fitted model is: 1.15543693896
Intercept of fitted model is: 2.75126872589
R-squared: 0.770651967105

1	stats.linregress(x, y)

LinregressResult(slope=1.1554369389622545, intercept=2.7512687258945392, rvalue=0.87786785287120206, pvalue=5.8204930077311082e-17, stderr=0.090979593726512847)

三、优化部分

3.1 无约束优化问题

下面的就是无约束优化，
$$
\mathrm{min}\quad f(x) = x^2 - 4.8x + 1.2
$$

下面的是约束优化：
$$
\mathrm{min}\quad f(x) = x^2 - 4.8x + 1.2\
s.t.\quad x \geq 0
$$

考虑Rosenbrock函数：
$$
f(\mathrm{x}) = \sum_{i =1}^{N-1} 100(x_i - xi^2)^2 + (1 - x{i-1})^2
$$

1
2
3

def rosen(x):
    """The Rosenbrock function"""
    return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)

3.1.1 Nelder-Mead单纯形法

1	x_0 = np.array([0.5, 1.6, 1.1, 0.8, 1.2])

1 2	res = opt.minimize(rosen, x_0, method='nelder-mead', options={'xtol': 1e-8, 'disp': True}) print(res)

Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 436
         Function evaluations: 706
 final_simplex: (array([[ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ],
       [ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ],
       [ 1.        ,  1.        ,  1.        ,  1.        ,  0.99999999],
       [ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ],
       [ 1.        ,  1.        ,  1.        ,  1.        ,  1.00000001],
       [ 1.        ,  1.        ,  1.        ,  1.        ,  1.00000001]]), array([  1.66149699e-17,   6.32117429e-17,   7.44105349e-17,
         8.24396866e-17,   9.53208876e-17,   1.07882961e-16]))
           fun: 1.6614969876635003e-17
       message: 'Optimization terminated successfully.'
          nfev: 706
           nit: 436
        status: 0
       success: True
             x: array([ 1.,  1.,  1.,  1.,  1.])

1 2	res = opt.minimize(rosen, x_0, method='powell', options={'xtol': 1e-8, 'disp': True}) print(res)

Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 24
         Function evaluations: 1924
   direc: array([[  4.55396239e-04,   1.36370090e-03,   2.24582768e-03,
          4.12213272e-03,   7.99477493e-03],
       [ -1.91529419e-03,  -3.00633609e-03,  -6.77652553e-03,
         -1.34919171e-02,  -2.67196797e-02],
       [ -3.76393598e-02,  -2.30587398e-02,   1.05038412e-02,
          3.43628871e-05,   7.36675942e-05],
       [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          1.00000000e+00,   0.00000000e+00],
       [  4.00791294e-06,   1.15417958e-05,   2.01270425e-05,
          5.08729704e-05,   1.09096801e-04]])
     fun: 2.041828615178145e-21
 message: 'Optimization terminated successfully.'
    nfev: 1924
     nit: 24
  status: 0
 success: True
       x: array([ 1.,  1.,  1.,  1.,  1.])

CHANGSHUN的BLOG

Changshun

Sep 27, 2016

一、`Scipy`概述

二、统计部分

2.1 生成随机数

2.2 假设检验

2.3 其他函数

三、优化部分

3.1 无约束优化问题

3.1.1 Nelder-Mead单纯形法

CHANGSHUN的BLOG

Changshun

科学计算之 `scipy`

Sep 27, 2016

一、Scipy概述

二、统计部分

2.1 生成随机数

2.2 假设检验

2.3 其他函数

三、优化部分

3.1 无约束优化问题

3.1.1 Nelder-Mead单纯形法

一、`Scipy`概述