怎么在python中使用sklearn實(shí)現(xiàn)線性回歸

發(fā)布時(shí)間：2021-04-30 16:19:25 來(lái)源：億速云閱讀：350 作者：Leah 欄目：開(kāi)發(fā)技術(shù)

本篇文章給大家分享的是有關(guān)怎么在python中使用sklearn實(shí)現(xiàn)線性回歸，小編覺(jué)得挺實(shí)用的，因此分享給大家學(xué)習(xí)，希望大家閱讀完這篇文章后可以有所收獲，話不多說(shuō)，跟著小編一起來(lái)看看吧。

python有哪些常用庫(kù)

python常用的庫(kù)：1.requesuts；2.scrapy；3.pillow；4.twisted；5.numpy；6.matplotlib；7.pygama；8.ipyhton等。

使用一階線性方程預(yù)測(cè)波士頓房?jī)r(jià)

載入的數(shù)據(jù)是隨sklearn一起發(fā)布的，來(lái)自boston 1993年之前收集的506個(gè)房屋的數(shù)據(jù)和價(jià)格。load_boston()用于載入數(shù)據(jù)。

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import time
from sklearn.linear_model import LinearRegression


boston = load_boston()

X = boston.data
y = boston.target

print("X.shape:{}. y.shape:{}".format(X.shape, y.shape))
print('boston.feature_name:{}'.format(boston.feature_names))

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=3)

model = LinearRegression()

start = time.clock()
model.fit(X_train, y_train)

train_score = model.score(X_train, y_train)
cv_score = model.score(X_test, y_test)

print('time used:{0:.6f}; train_score:{1:.6f}, sv_score:{2:.6f}'.format((time.clock()-start),
                                    train_score, cv_score))

輸出內(nèi)容為：

X.shape:(506, 13). y.shape:(506,)
boston.feature_name:['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
time used:0.012403; train_score:0.723941, sv_score:0.794958

可以看到測(cè)試集上準(zhǔn)確率并不高，應(yīng)該是欠擬合。

使用多項(xiàng)式做線性回歸

上面的例子是欠擬合的，說(shuō)明模型太簡(jiǎn)單，無(wú)法擬合數(shù)據(jù)的情況?，F(xiàn)在增加模型復(fù)雜度，引入多項(xiàng)式。

打個(gè)比方，如果原來(lái)的特征是[a, b]兩個(gè)特征，

在degree為2的情況下，多項(xiàng)式特征變?yōu)閇1, a, b, a^2, ab, b^2]。degree為其它值的情況依次類(lèi)推。

多項(xiàng)式特征相當(dāng)于增加了數(shù)據(jù)和模型的復(fù)雜性，能夠更好的擬合。

下面的代碼使用Pipeline把多項(xiàng)式特征和線性回歸特征連起來(lái)，最終測(cè)試degree在1、2、3的情況下的得分。

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import time
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline

def polynomial_model(degree=1):
  polynomial_features = PolynomialFeatures(degree=degree, include_bias=False)

  linear_regression = LinearRegression(normalize=True)
  pipeline = Pipeline([('polynomial_features', polynomial_features),
             ('linear_regression', linear_regression)])
  return pipeline

boston = load_boston()
X = boston.data
y = boston.target
print("X.shape:{}. y.shape:{}".format(X.shape, y.shape))
print('boston.feature_name:{}'.format(boston.feature_names))

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=3)

for i in range(1,4):
  print( 'degree:{}'.format( i ) )
  model = polynomial_model(degree=i)

  start = time.clock()
  model.fit(X_train, y_train)

  train_score = model.score(X_train, y_train)
  cv_score = model.score(X_test, y_test)

  print('time used:{0:.6f}; train_score:{1:.6f}, sv_score:{2:.6f}'.format((time.clock()-start),
                                    train_score, cv_score))

輸出結(jié)果為：

X.shape:(506, 13). y.shape:(506,)
boston.feature_name:['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
degree:1
time used:0.003576; train_score:0.723941, sv_score:0.794958
degree:2
time used:0.030123; train_score:0.930547, sv_score:0.860465
degree:3
time used:0.137346; train_score:1.000000, sv_score:-104.429619

可以看到degree為1和上面不使用多項(xiàng)式是一樣的。degree為3在訓(xùn)練集上的得分為1，在測(cè)試集上得分是負(fù)數(shù)，明顯過(guò)擬合了。

所以最終應(yīng)該選擇degree為2的模型。

二階多項(xiàng)式比一階多項(xiàng)式好的多，但是測(cè)試集和訓(xùn)練集上的得分仍有不少差距，這可能是數(shù)據(jù)不夠的原因，需要更多的訊據(jù)才能進(jìn)一步提高模型的準(zhǔn)確度。

正規(guī)方程解法和梯度下降的比較

除了梯度下降法來(lái)逼近最優(yōu)解，也可以使用正規(guī)的方程解法直接計(jì)算出最終的解來(lái)。

根據(jù)吳恩達(dá)的課程，線性回歸最優(yōu)解為：

theta = (X^T * X)^-1 * X^T * y

其實(shí)兩種方法各有優(yōu)缺點(diǎn)：

梯度下降法：

缺點(diǎn)：需要選擇學(xué)習(xí)率，需要多次迭代

優(yōu)點(diǎn)：特征值很多（1萬(wàn)以上）時(shí)仍然能以不錯(cuò)的速度工作

正規(guī)方程解法：

優(yōu)點(diǎn)：不需要設(shè)置學(xué)習(xí)率，不需要多次迭代

缺點(diǎn)：需要計(jì)算X的轉(zhuǎn)置和逆，復(fù)雜度O3；特征值很多（1萬(wàn)以上）時(shí)特變慢

在分類(lèi)等非線性計(jì)算中，正規(guī)方程解法并不適用，所以梯度下降法適用范圍更廣。

以上就是怎么在python中使用sklearn實(shí)現(xiàn)線性回歸，小編相信有部分知識(shí)點(diǎn)可能是我們?nèi)粘９ぷ鲿?huì)見(jiàn)到或用到的。希望你能通過(guò)這篇文章學(xué)到更多知識(shí)。更多詳情敬請(qǐng)關(guān)注億速云行業(yè)資訊頻道。

向AI問(wèn)一下細(xì)節(jié)

怎么在python中使用sklearn實(shí)現(xiàn)線性回歸

python有哪些常用庫(kù)

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽