您好,登錄后才能下訂單哦!
協(xié)方差:兩個(gè)變量總體誤差的期望。
簡單的說就是度量Y和X之間關(guān)系的方向和強(qiáng)度。
X :預(yù)測(cè)變量
Y :響應(yīng)變量
Y和X的協(xié)方差:[來度量各個(gè)維度偏離其均值的程度]
備注:[之所以除以n-1而不是除以n,是因?yàn)檫@樣能使我們以較小的樣本集更好的逼近總體的協(xié)方差,即統(tǒng)計(jì)上所謂的“無偏估計(jì)”。而方差則僅僅是標(biāo)準(zhǔn)差的平方]
如果結(jié)果為正值,則說明兩者是正相關(guān)的(從協(xié)方差可以引出“相關(guān)系數(shù)”的定義),
如果結(jié)果為負(fù)值就說明負(fù)相關(guān)的
如果為0,也是就是統(tǒng)計(jì)上說的“相互獨(dú)立”
為什么呢:
如果第1,3象限點(diǎn)位多,最終的和就是正,X增大Y增大
如果第2,4象限點(diǎn)位多,最終的和就是負(fù),X增大Y減小
Cov(Y,X)會(huì)受到度量單位的影響
引入相關(guān)系數(shù):
python使用以下公式進(jìn)行計(jì)算[上面的公式不便于編程,需要多次掃描數(shù)據(jù),但是微小的錯(cuò)誤會(huì)被放大哦]:
#coding:utf-8 ''' Y和X的相關(guān)系數(shù)就是標(biāo)準(zhǔn)化后變量的協(xié)方差 ''' ''' __author__ = 'similarface' QQ:841196883@qq.com ''' from math import sqrt from pandas import * import pandas as pd import os,sys import matplotlib.pyplot as plt #安裝不了 就github下載源碼安裝 from sklearn import datasets, linear_model ''' 根據(jù)文件加載數(shù)據(jù) ''' def loaddataInTab(filename): if os.path.isfile(filename): try: return pd.read_table(filename) except Exception,e: print(e.message) else: print("文件存在!") return None ''' 獲取Y,X的相關(guān)系數(shù),即為:皮爾遜相關(guān)系數(shù) ''' def pearson(rating1, rating2): ''' 皮爾遜相關(guān)參數(shù) 在統(tǒng)計(jì)學(xué)中,皮爾遜積矩相關(guān)系數(shù) (英語:Pearson product-moment correlation coefficient, 又稱作 PPMCC或PCCs[1], 文章中常用r或Pearson's r表示) 用于度量兩個(gè)變量X和Y之間的相關(guān)(線性相關(guān)),其值介于-1與1之間。 在自然科學(xué)領(lǐng)域中,該系數(shù)廣泛用于度量兩個(gè)變量之間的相關(guān)程度。 0.8-1.0 極強(qiáng)相關(guān) 0.6-0.8 強(qiáng)相關(guān) 0.4-0.6 中等程度相關(guān) 0.2-0.4 弱相關(guān) 0.0-0.2 極弱相關(guān)或無相關(guān) ''' sum_xy, sum_x, sum_y, sum_x2, sum_y2, n = 0, 0, 0, 0, 0, 0 for i in xrange(len(rating1)): n = n + 1 x = rating1[i] y = rating2[i] sum_xy += x * y sum_x += x sum_y += y sum_x2 += x ** 2 sum_y2 += y ** 2 if n == 0: return 0 fenmu = sqrt(sum_x2 - (sum_x ** 2) / n) * sqrt(sum_y2 - (sum_y ** 2) / n) if fenmu == 0: return 0 else: return (sum_xy - (sum_x * sum_y) / n) / fenmu data=loaddataInTab('./AnscombeQuartet') #x1,y1是線性相關(guān)的 diabetes_x1_test=data['X1'] diabetes_y1_test=data['Y1'] plt.scatter(diabetes_x1_test, diabetes_y1_test, color='black') print("黑色點(diǎn)的相關(guān)系數(shù)為:{}(皮爾遜相關(guān)參數(shù))".format(pearson(diabetes_x1_test,diabetes_y1_test))) regr1 = linear_model.LinearRegression() diabetes_x1_train=diabetes_x1_test.as_matrix()[:, np.newaxis] diabetes_y1_train=diabetes_y1_test.as_matrix()[:, np.newaxis] regr1.fit(diabetes_x1_train, diabetes_y1_train) plt.plot(diabetes_x1_test.as_matrix()[:, np.newaxis], regr1.predict(diabetes_x1_test.as_matrix()[:, np.newaxis]), color='black',linewidth=6) #x2,y2是非線性 二次函數(shù)擬合 diabetes_x2_test=data['X2'] diabetes_y2_test=data['Y2'] plt.scatter(diabetes_x2_test, diabetes_y2_test, color='red') print("紅色點(diǎn)的相關(guān)系數(shù)為:{}(皮爾遜相關(guān)參數(shù))".format(pearson(diabetes_x2_test,diabetes_y2_test))) regr2 = linear_model.LinearRegression() diabetes_x2_train=diabetes_x2_test.as_matrix()[:, np.newaxis] diabetes_y2_train=diabetes_y2_test.as_matrix()[:, np.newaxis] regr2.fit(diabetes_x2_train, diabetes_y2_train) plt.plot(diabetes_x2_test.as_matrix()[:, np.newaxis], regr2.predict(diabetes_x2_test.as_matrix()[:, np.newaxis]), color='red',linewidth=4) #x3,y3 數(shù)據(jù)對(duì)中出現(xiàn)了 孤立點(diǎn) diabetes_x3_test=data['X3'] diabetes_y3_test=data['Y3'] plt.scatter(diabetes_x3_test, diabetes_y3_test, color='blue') print("藍(lán)色點(diǎn)的相關(guān)系數(shù)為:{}(皮爾遜相關(guān)參數(shù))".format(pearson(diabetes_x3_test,diabetes_y3_test))) regr3 = linear_model.LinearRegression() diabetes_x3_train=diabetes_x3_test.as_matrix()[:, np.newaxis] diabetes_y3_train=diabetes_y3_test.as_matrix()[:, np.newaxis] regr3.fit(diabetes_x3_train, diabetes_y3_train) plt.plot(diabetes_x3_test.as_matrix()[:, np.newaxis], regr3.predict(diabetes_x3_test.as_matrix()[:, np.newaxis]), color='blue',linewidth=2) #x4,y4不適合線性擬合 極端值確立了直線 diabetes_x4_test=data['X4'] diabetes_y4_test=data['Y4'] plt.scatter(diabetes_x4_test, diabetes_y4_test, color='green') print("綠色點(diǎn)的相關(guān)系數(shù)為:{}(皮爾遜相關(guān)參數(shù))".format(pearson(diabetes_x4_test,diabetes_y4_test))) regr4 = linear_model.LinearRegression() diabetes_x4_train=diabetes_x4_test.as_matrix()[:, np.newaxis] diabetes_y4_train=diabetes_y4_test.as_matrix()[:, np.newaxis] regr4.fit(diabetes_x4_train, diabetes_y4_train) plt.plot(diabetes_x4_test.as_matrix()[:, np.newaxis], regr4.predict(diabetes_x4_test.as_matrix()[:, np.newaxis]), color='green',linewidth=1) plt.xticks(()) plt.yticks(()) plt.show() ''' 把上面的4組數(shù)據(jù)去做線性回歸: 有圖可知都做出了,斜率和截距相等的擬合線性 4種X,Y的相關(guān)系數(shù)都很接近 在解釋相關(guān)系數(shù)之前,圖像點(diǎn)位的散點(diǎn)分布是很重要的 如果完全基于相關(guān)系數(shù)分析數(shù)據(jù),將無法發(fā)現(xiàn)數(shù)據(jù)構(gòu)造模式之間的差別 '''
參考數(shù)據(jù):
1 2 3 4 5 6 7 8 9 10 11 12 | Y1 X1 Y2 X2 Y3 X3 Y4 X4 8.04 10 9.14 10 7.46 10 6.58 8 6.95 8 8.14 8 6.77 8 5.76 8 7.58 13 8.74 13 12.74 13 7.71 8 8.81 9 8.77 9 7.11 9 8.84 8 8.33 11 9.26 11 7.81 11 8.47 8 9.96 14 8.1 14 8.84 14 7.04 8 7.24 6 6.13 6 6.08 6 5.25 8 4.26 4 3.1 4 5.39 4 12.5 19 10.84 12 9.13 12 8.15 12 5.56 8 4.82 7 7.26 7 6.42 7 7.91 8 5.68 5 4.74 5 5.73 5 6.89 8 |
實(shí)例:計(jì)算機(jī)維修
#coding:utf-8 ''' 實(shí)例:計(jì)算機(jī)維修 維修時(shí)間 零件個(gè)數(shù) Minutes Units 1 ... ''' __author__ = 'similarface' from math import sqrt from pandas import * import pandas as pd import os import matplotlib.pyplot as plt #安裝不了 就github下載源碼安裝 from sklearn import datasets, linear_model ''' 根據(jù)文件加載數(shù)據(jù) ''' def loaddataInTab(filename): if os.path.isfile(filename): try: return pd.read_table(filename) except Exception,e: print(e.message) else: print("文件存在!") return None ''' 獲取Y,X的相關(guān)系數(shù),即為:皮爾遜相關(guān)系數(shù) ''' def pearson(rating1, rating2): ''' 皮爾遜相關(guān)參數(shù) ''' sum_xy, sum_x, sum_y, sum_x2, sum_y2, n = 0, 0, 0, 0, 0, 0 for i in xrange(len(rating1)): n = n + 1 x = rating1[i] y = rating2[i] sum_xy += x * y sum_x += x sum_y += y sum_x2 += x ** 2 sum_y2 += y ** 2 if n == 0: return 0 fenmu = sqrt(sum_x2 - (sum_x ** 2) / n) * sqrt(sum_y2 - (sum_y ** 2) / n) if fenmu == 0: return 0 else: return (sum_xy - (sum_x * sum_y) / n) / fenmu def getCovariance(XList,YList): ''' 獲取協(xié)方差 ''' averageX=sum(XList)/float(len(XList)) averageY=sum(YList)/float(len(YList)) sumall=0.0 for i in xrange(len(XList)): sumall=sumall+(XList[i]-averageX)*(YList[i]-averageY) cov=sumall/(len(XList)-1) return cov computerrepairdata=loaddataInTab('./data/computerrepair.data') Minutes=computerrepairdata['Minutes'] Units=computerrepairdata['Units'] print("該數(shù)據(jù)集的協(xié)方差:{}".format(getCovariance(Minutes,Units))) print("該數(shù)據(jù)集相關(guān)系數(shù):{}".format(pearson(Minutes,Units))) #維修時(shí)間 元件個(gè)數(shù)散點(diǎn)圖 plt.scatter(Minutes, Units, color='black') regr = linear_model.LinearRegression() Minutes_train=Minutes.as_matrix()[:, np.newaxis] Units_train=Units.as_matrix()[:, np.newaxis] regr.fit(Minutes_train, Units_train) print "相關(guān)信息" print("==================================================") print("方程斜率:{}".format(regr.coef_)) print("方程截距:{}".format(regr._residues)) print("回歸方程:y={}x+{}".format(regr.coef_[0][0],regr._residues[0])) print("==================================================") plt.plot(Minutes_train, regr.predict(Minutes_train), color='red',linewidth=1) plt.xticks(()) plt.yticks(()) plt.show()
result:
該數(shù)據(jù)集的協(xié)方差:136.0
該數(shù)據(jù)集相關(guān)系數(shù):0.993687243916
相關(guān)信息
==================================================
方程斜率:` 0`.`06366959`
方程截距:[ 1.43215942]
回歸方程:y=0.0636695930877x+1.43215942092
==================================================
===============參數(shù)估計(jì)=================
上面的斜率和截距如何算出來的?
簡單線性回歸模型:
Y=β0+β1X+ε
β0:常數(shù)項(xiàng)目
β1:回歸系數(shù)
ε:隨機(jī)擾動(dòng)或誤差[除X和Y的線性關(guān)系之外的隨機(jī)因素對(duì)Y的影響]
單個(gè)觀測(cè):
yi是響應(yīng)變量的第i個(gè)觀測(cè)值,xi是X的第i個(gè)預(yù)測(cè)值
誤差表達(dá)式:
鉛直距離平方和:[真實(shí)響應(yīng)變量-預(yù)測(cè)響應(yīng)變量]的平方和[最小二乘法]
使得上訴公式最小值的β[0,1]即為所求的
直接給出該式的參數(shù)解:
其中帶入化簡:
算最小2乘法的例子:
(x,y):(1,6),(2,5),(3,7),(4,10)
假設(shè):y=β1+β2x
β1+1β2=6
β1+2β2=5
β1+3β2=7
β1+4β2=10
得鉛直距離方:
化簡:
注:這個(gè)地方使用了編導(dǎo)數(shù),凹曲面上的極值點(diǎn)在切線斜率為0處
轉(zhuǎn)化成2維面
就解2元1次方程組 一個(gè)為3.5 一個(gè)為1.4
使用公式計(jì)算:
β2=(1-2.5)*(6-7)+(2-2.5)*(5-7)+(3-2.5)(7-7)+(4-2.5)(10-7)/[(1-2.5)(1-2.5)+(2-2.5)(2-2.5)
+(3-2.5)(3-2.5)+(4-2.5)(4-2.5)]=7/5=1.4
python代碼計(jì)算:
def lsqr(A, b, damp=0.0, atol=1e-8, btol=1e-8, conlim=1e8, iter_lim=None, show=False, calc_var=False): """Find the least-squares solution to a large, sparse, linear system of equations. The function solves ``Ax = b`` or ``min ||b - Ax||^2`` or ``min ||Ax - b||^2 + d^2 ||x||^2``. The matrix A may be square or rectangular (over-determined or under-determined), and may have any rank. :: 1. Unsymmetric equations -- solve A*x = b 2. Linear least squares -- solve A*x = b in the least-squares sense 3. Damped least squares -- solve ( A )*x = ( b ) ( damp*I ) ( 0 ) in the least-squares sense Parameters ---------- A : {sparse matrix, ndarray, LinearOperatorLinear} Representation of an m-by-n matrix. It is required that the linear operator can produce ``Ax`` and ``A^T x``. b : (m,) ndarray Right-hand side vector ``b``. damp : float Damping coefficient. atol, btol : float Stopping tolerances. If both are 1.0e-9 (say), the final residual norm should be accurate to about 9 digits. (The final x will usually have fewer correct digits, depending on cond(A) and the size of damp.) conlim : float Another stopping tolerance. lsqr terminates if an estimate of ``cond(A)`` exceeds `conlim`. For compatible systems ``Ax = b``, `conlim` could be as large as 1.0e+12 (say). For least-squares problems, conlim should be less than 1.0e+8. Maximum precision can be obtained by setting ``atol = btol = conlim = zero``, but the number of iterations may then be excessive. iter_lim : int Explicit limitation on number of iterations (for safety). show : bool Display an iteration log. calc_var : bool Whether to estimate diagonals of ``(A'A + damp^2*I)^{-1}``. Returns ------- x : ndarray of float The final solution. istop : int Gives the reason for termination. 1 means x is an approximate solution to Ax = b. 2 means x approximately solves the least-squares problem. itn : int Iteration number upon termination. r1norm : float ``norm(r)``, where ``r = b - Ax``. r2norm : float ``sqrt( norm(r)^2 + damp^2 * norm(x)^2 )``. Equal to `r1norm` if ``damp == 0``. anorm : float Estimate of Frobenius norm of ``Abar = [[A]; [damp*I]]``. acond : float Estimate of ``cond(Abar)``. arnorm : float Estimate of ``norm(A'*r - damp^2*x)``. xnorm : float ``norm(x)`` var : ndarray of float If ``calc_var`` is True, estimates all diagonals of ``(A'A)^{-1}`` (if ``damp == 0``) or more generally ``(A'A + damp^2*I)^{-1}``. This is well defined if A has full column rank or ``damp > 0``. (Not sure what var means if ``rank(A) < n`` and ``damp = 0.``) Notes ----- LSQR uses an iterative method to approximate the solution. The number of iterations required to reach a certain accuracy depends strongly on the scaling of the problem. Poor scaling of the rows or columns of A should therefore be avoided where possible. For example, in problem 1 the solution is unaltered by row-scaling. If a row of A is very small or large compared to the other rows of A, the corresponding row of ( A b ) should be scaled up or down. In problems 1 and 2, the solution x is easily recovered following column-scaling. Unless better information is known, the nonzero columns of A should be scaled so that they all have the same Euclidean norm (e.g., 1.0). In problem 3, there is no freedom to re-scale if damp is nonzero. However, the value of damp should be assigned only after attention has been paid to the scaling of A. The parameter damp is intended to help regularize ill-conditioned systems, by preventing the true solution from being very large. Another aid to regularization is provided by the parameter acond, which may be used to terminate iterations before the computed solution becomes very large. If some initial estimate ``x0`` is known and if ``damp == 0``, one could proceed as follows: 1. Compute a residual vector ``r0 = b - A*x0``. 2. Use LSQR to solve the system ``A*dx = r0``. 3. Add the correction dx to obtain a final solution ``x = x0 + dx``. This requires that ``x0`` be available before and after the call to LSQR. To judge the benefits, suppose LSQR takes k1 iterations to solve A*x = b and k2 iterations to solve A*dx = r0. If x0 is "good", norm(r0) will be smaller than norm(b). If the same stopping tolerances atol and btol are used for each system, k1 and k2 will be similar, but the final solution x0 + dx should be more accurate. The only way to reduce the total work is to use a larger stopping tolerance for the second system. If some value btol is suitable for A*x = b, the larger value btol*norm(b)/norm(r0) should be suitable for A*dx = r0. Preconditioning is another way to reduce the number of iterations. If it is possible to solve a related system ``M*x = b`` efficiently, where M approximates A in some helpful way (e.g. M - A has low rank or its elements are small relative to those of A), LSQR may converge more rapidly on the system ``A*M(inverse)*z = b``, after which x can be recovered by solving M*x = z. If A is symmetric, LSQR should not be used! Alternatives are the symmetric conjugate-gradient method (cg) and/or SYMMLQ. SYMMLQ is an implementation of symmetric cg that applies to any symmetric A and will converge more rapidly than LSQR. If A is positive definite, there are other implementations of symmetric cg that require slightly less work per iteration than SYMMLQ (but will take the same number of iterations). References ---------- .. [1] C. C. Paige and M. A. Saunders (1982a). "LSQR: An algorithm for sparse linear equations and sparse least squares", ACM TOMS 8(1), 43-71. .. [2] C. C. Paige and M. A. Saunders (1982b). "Algorithm 583. LSQR: Sparse linear equations and least squares problems", ACM TOMS 8(2), 195-209. .. [3] M. A. Saunders (1995). "Solution of sparse rectangular systems using LSQR and CRAIG", BIT 35, 588-604. """ A = aslinearoperator(A) if len(b.shape) > 1: b = b.squeeze() m, n = A.shape if iter_lim is None: iter_lim = 2 * n var = np.zeros(n) msg = ('The exact solution is x = 0 ', 'Ax - b is small enough, given atol, btol ', 'The least-squares solution is good enough, given atol ', 'The estimate of cond(Abar) has exceeded conlim ', 'Ax - b is small enough for this machine ', 'The least-squares solution is good enough for this machine', 'Cond(Abar) seems to be too large for this machine ', 'The iteration limit has been reached ') itn = 0 istop = 0 nstop = 0 ctol = 0 if conlim > 0: ctol = 1/conlim anorm = 0 acond = 0 dampsq = damp**2 ddnorm = 0 res2 = 0 xnorm = 0 xxnorm = 0 z = 0 cs2 = -1 sn2 = 0 """ Set up the first vectors u and v for the bidiagonalization. These satisfy beta*u = b, alfa*v = A'u. """ __xm = np.zeros(m) # a matrix for temporary holding __xn = np.zeros(n) # a matrix for temporary holding v = np.zeros(n) u = b x = np.zeros(n) alfa = 0 beta = np.linalg.norm(u) w = np.zeros(n) if beta > 0: u = (1/beta) * u v = A.rmatvec(u) alfa = np.linalg.norm(v) if alfa > 0: v = (1/alfa) * v w = v.copy() rhobar = alfa phibar = beta bnorm = beta rnorm = beta r1norm = rnorm r2norm = rnorm # Reverse the order here from the original matlab code because # there was an error on return when arnorm==0 arnorm = alfa * beta if arnorm == 0: print(msg[0]) return x, istop, itn, r1norm, r2norm, anorm, acond, arnorm, xnorm, var head1 = ' Itn x[0] r1norm r2norm ' head2 = ' Compatible LS Norm A Cond A' if show: print(' ') print(head1, head2) test1 = 1 test2 = alfa / beta str1 = '%6g %12.5e' % (itn, x[0]) str2 = ' %10.3e %10.3e' % (r1norm, r2norm) str3 = ' %8.1e %8.1e' % (test1, test2) print(str1, str2, str3) # Main iteration loop. while itn < iter_lim: itn = itn + 1 """ % Perform the next step of the bidiagonalization to obtain the % next beta, u, alfa, v. These satisfy the relations % beta*u = a*v - alfa*u, % alfa*v = A'*u - beta*v. """ u = A.matvec(v) - alfa * u beta = np.linalg.norm(u) if beta > 0: u = (1/beta) * u anorm = sqrt(anorm**2 + alfa**2 + beta**2 + damp**2) v = A.rmatvec(u) - beta * v alfa = np.linalg.norm(v) if alfa > 0: v = (1 / alfa) * v # Use a plane rotation to eliminate the damping parameter. # This alters the diagonal (rhobar) of the lower-bidiagonal matrix. rhobar1 = sqrt(rhobar**2 + damp**2) cs1 = rhobar / rhobar1 sn1 = damp / rhobar1 psi = sn1 * phibar phibar = cs1 * phibar # Use a plane rotation to eliminate the subdiagonal element (beta) # of the lower-bidiagonal matrix, giving an upper-bidiagonal matrix. cs, sn, rho = _sym_ortho(rhobar1, beta) theta = sn * alfa rhobar = -cs * alfa phi = cs * phibar phibar = sn * phibar tau = sn * phi # Update x and w. t1 = phi / rho t2 = -theta / rho dk = (1 / rho) * w x = x + t1 * w w = v + t2 * w ddnorm = ddnorm + np.linalg.norm(dk)**2 if calc_var: var = var + dk**2 # Use a plane rotation on the right to eliminate the # super-diagonal element (theta) of the upper-bidiagonal matrix. # Then use the result to estimate norm(x). delta = sn2 * rho gambar = -cs2 * rho rhs = phi - delta * z zbar = rhs / gambar xnorm = sqrt(xxnorm + zbar**2) gamma = sqrt(gambar**2 + theta**2) cs2 = gambar / gamma sn2 = theta / gamma z = rhs / gamma xxnorm = xxnorm + z**2 # Test for convergence. # First, estimate the condition of the matrix Abar, # and the norms of rbar and Abar'rbar. acond = anorm * sqrt(ddnorm) res1 = phibar**2 res2 = res2 + psi**2 rnorm = sqrt(res1 + res2) arnorm = alfa * abs(tau) # Distinguish between # r1norm = ||b - Ax|| and # r2norm = rnorm in current code # = sqrt(r1norm^2 + damp^2*||x||^2). # Estimate r1norm from # r1norm = sqrt(r2norm^2 - damp^2*||x||^2). # Although there is cancellation, it might be accurate enough. r1sq = rnorm**2 - dampsq * xxnorm r1norm = sqrt(abs(r1sq)) if r1sq < 0: r1norm = -r1norm r2norm = rnorm # Now use these norms to estimate certain other quantities, # some of which will be small near a solution. test1 = rnorm / bnorm test2 = arnorm / (anorm * rnorm) test3 = 1 / acond t1 = test1 / (1 + anorm * xnorm / bnorm) rtol = btol + atol * anorm * xnorm / bnorm # The following tests guard against extremely small values of # atol, btol or ctol. (The user may have set any or all of # the parameters atol, btol, conlim to 0.) # The effect is equivalent to the normal tests using # atol = eps, btol = eps, conlim = 1/eps. if itn >= iter_lim: istop = 7 if 1 + test3 <= 1: istop = 6 if 1 + test2 <= 1: istop = 5 if 1 + t1 <= 1: istop = 4 # Allow for tolerances set by the user. if test3 <= ctol: istop = 3 if test2 <= atol: istop = 2 if test1 <= rtol: istop = 1 # See if it is time to print something. prnt = False if n <= 40: prnt = True if itn <= 10: prnt = True if itn >= iter_lim-10: prnt = True # if itn%10 == 0: prnt = True if test3 <= 2*ctol: prnt = True if test2 <= 10*atol: prnt = True if test1 <= 10*rtol: prnt = True if istop != 0: prnt = True if prnt: if show: str1 = '%6g %12.5e' % (itn, x[0]) str2 = ' %10.3e %10.3e' % (r1norm, r2norm) str3 = ' %8.1e %8.1e' % (test1, test2) str4 = ' %8.1e %8.1e' % (anorm, acond) print(str1, str2, str3, str4) if istop != 0: break # End of iteration loop. return x, istop, itn, r1norm, r2norm, anorm, acond, arnorm, xnorm, var
================================
校驗(yàn)假設(shè):
雖然上面的實(shí)例算出了線性關(guān)系,但是我們?cè)趺磁袛辔覀冮_始預(yù)設(shè)立得假設(shè)呢,就是我們是否要推翻假設(shè),畢竟是假設(shè),
假設(shè)未來一周我有一個(gè)億,根據(jù)現(xiàn)實(shí)偏離太大,估計(jì)會(huì)推翻假設(shè)。
算了還是切入正題吧:
開始我假設(shè)了個(gè)模型:
我們需要校驗(yàn)這個(gè)假設(shè),我們還要附加假定:
對(duì)于X的每一個(gè)固定值,所以的ε都相互獨(dú)立,并且都服從均值為0,方程為σ方 的正態(tài)分布
得到:
SSE為殘差,n-2為自由度[自由度=觀測(cè)個(gè)數(shù)-待估回歸參數(shù)個(gè)數(shù)]
帶入化簡得倒,標(biāo)準(zhǔn)誤差分別是:
的標(biāo)準(zhǔn)誤差刻畫了斜率的估計(jì)精度,標(biāo)準(zhǔn)誤越小估計(jì)精度越高
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。