怎么使用Python構(gòu)建電影推薦系統(tǒng)

發(fā)布時間：2023-04-12 14:58:40 來源：億速云閱讀：141 作者：iii 欄目：編程語言

這篇文章主要講解了“怎么使用Python構(gòu)建電影推薦系統(tǒng)”，文中的講解內(nèi)容簡單清晰，易于學(xué)習(xí)與理解，下面請大家跟著小編的思路慢慢深入，一起來研究和學(xué)習(xí)“怎么使用Python構(gòu)建電影推薦系統(tǒng)”吧！

導(dǎo)入數(shù)據(jù)

導(dǎo)入和合并數(shù)據(jù)集并創(chuàng)建 Pandas DataFrame

MovieLens 20M 數(shù)據(jù)集自 1995 年以來超過 2000 萬的電影評級和標(biāo)記活動。

# usecols 允許選擇自己選擇的特征，并通過dtype設(shè)定對應(yīng)類型
movies_df=pd.read_csv('movies.csv', 
usecols=['movieId','title'], 
dtype={'movieId':'int32','title':'str'})
movies_df.head()

怎么使用Python構(gòu)建電影推薦系統(tǒng)

ratings_df=pd.read_csv('ratings.csv',
 usecols=['userId', 'movieId', 'rating','timestamp'],
 dtype={'userId': 'int32', 'movieId': 'int32', 'rating': 'float32'})
ratings_df.head()

怎么使用Python構(gòu)建電影推薦系統(tǒng)

檢查是否存在任何空值以及兩個數(shù)據(jù)中的條目數(shù)。

# 檢查缺失值
movies_df.isnull().sum()

movieId 0

title 0

dtype: int64

ratings_df.isnull().sum()

userId 0

movieId 0

rating 0

timestamp 0

dtype: int64

print("Movies:",movies_df.shape)
print("Ratings:",ratings_df.shape)

Movies: (9742, 2)

Ratings: (100836, 4)

合并列上的數(shù)據(jù)幀 'movieId'

# movies_df.info()
# ratings_df.info()
movies_merged_df=movies_df.merge(ratings_df, on='movieId')
movies_merged_df.head()

怎么使用Python構(gòu)建電影推薦系統(tǒng)

現(xiàn)在已經(jīng)成功合并了導(dǎo)入的數(shù)據(jù)集。

添加衍生特征

添加必要的特征來分析數(shù)據(jù)。

通過按電影標(biāo)題對用戶評分進(jìn)行分組來創(chuàng)建'Average Rating' & 'Rating Count'列。

movies_average_rating=movies_merged_df.groupby('title')['rating']
 .mean().sort_values(ascending=False)
.reset_index().rename(columns={'rating':'Average Rating'})
movies_average_rating.head()

怎么使用Python構(gòu)建電影推薦系統(tǒng)

movies_rating_count=movies_merged_df.groupby('title')['rating']
.count().sort_values(ascending=True)
 .reset_index().rename(columns={'rating':'Rating Count'}) #ascending=False
movies_rating_count_avg=movies_rating_count.merge(movies_average_rating, on='title')
movies_rating_count_avg.head()

怎么使用Python構(gòu)建電影推薦系統(tǒng)

目前已經(jīng)創(chuàng)建了 2 個新的衍生特征。

數(shù)據(jù)可視化

使用 Seaborn 可視化數(shù)據(jù)：

經(jīng)過分析發(fā)現(xiàn)，許多電影在近 10 萬用戶評分的數(shù)據(jù)集上都有完美的 5 星平均評分。這表明存在異常值，我們需要通過可視化進(jìn)一步確認(rèn)。
多部電影的評分比較單一，建議設(shè)置一個評分門檻值，以便產(chǎn)生有價值的推薦。

使用 seaborn & matplotlib 可視化數(shù)據(jù)，以便更好地觀察和分析數(shù)據(jù)。

將新創(chuàng)建的特征繪制直方圖，并查看它們的分布。設(shè)置 bin 大小為80，該值的設(shè)置需要具體分析，并合理設(shè)置。

# 導(dǎo)入可視化庫
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(font_scale = 1)
plt.rcParams["axes.grid"] = False
plt.style.use('dark_background')
%matplotlib inline

# 繪制圖形
plt.figure(figsize=(12,4))
plt.hist(movies_rating_count_avg['Rating Count'],bins=80,color='tab:purple')
plt.ylabel('Ratings Count(Scaled)', fontsize=16)
plt.savefig('ratingcounthist.jpg')

plt.figure(figsize=(12,4))
plt.hist(movies_rating_count_avg['Average Rating'],bins=80,color='tab:purple')
plt.ylabel('Average Rating',fontsize=16)
plt.savefig('avgratinghist.jpg')

怎么使用Python構(gòu)建電影推薦系統(tǒng)

圖1 Average Rating直方圖

怎么使用Python構(gòu)建電影推薦系統(tǒng)

圖2 Rating Count的直方圖

現(xiàn)在創(chuàng)建一個joinplot二維圖表，將這兩個特征一起可視化。

plot=sns.jointplot(x='Average Rating',
 y='Rating Count',
 data=movies_rating_count_avg,
 alpha=0.5, 
 color='tab:pink')
plot.savefig('joinplot.jpg')

怎么使用Python構(gòu)建電影推薦系統(tǒng)

Average Rating和Rating Count的二維圖

分析

圖1證實了，大部分電影的評分都是較低的。除了設(shè)置閾值之外，我們還可以在這個用例中使用一些更高百分比的分位數(shù)。
直方圖 2 展示了“Average Rating”的分布函數(shù)。

數(shù)據(jù)清洗

運(yùn)用describe()函數(shù)得到數(shù)據(jù)集的描述統(tǒng)計值，如分位數(shù)和標(biāo)準(zhǔn)差等。

pd.set_option('display.float_format', lambda x: '%.3f' % x)
print(rating_with_RatingCount['Rating Count'].describe())

count 100836.000
mean58.759
std 61.965
min1.000
25% 13.000
50% 39.000
75% 84.000
max329.000
Name: Rating Count, dtype: float64

設(shè)置閾值并篩選出高于閾值的數(shù)據(jù)。

popularity_threshold = 50
popular_movies= rating_with_RatingCount[
rating_with_RatingCount['Rating Count']>=popularity_threshold]
popular_movies.head()
# popular_movies.shape

怎么使用Python構(gòu)建電影推薦系統(tǒng)

至此已經(jīng)通過過濾掉了評論低于閾值的電影來清洗數(shù)據(jù)。

創(chuàng)建數(shù)據(jù)透視表

創(chuàng)建一個以用戶為索引、以電影為列的數(shù)據(jù)透視表

為了稍后將數(shù)據(jù)加載到模型中，需要創(chuàng)建一個數(shù)據(jù)透視表。并設(shè)置'title'作為索引，'userId'為列，'rating'為值。

import os
movie_features_df=popular_movies.pivot_table(
index='title',columns='userId',values='rating').fillna(0)
movie_features_df.head()
movie_features_df.to_excel('output.xlsx')

怎么使用Python構(gòu)建電影推薦系統(tǒng)

接下來將創(chuàng)建的數(shù)據(jù)透視表加載到模型。

建立 kNN 模型

建立 kNN 模型并輸出與每部電影相似的 5 個推薦

使用scipy.sparse模塊中的csr_matrix方法，將數(shù)據(jù)透視表轉(zhuǎn)換為用于擬合模型的數(shù)組矩陣。

from scipy.sparse import csr_matrix
movie_features_df_matrix = csr_matrix(movie_features_df.values)

最后，使用之前生成的矩陣數(shù)據(jù)，來訓(xùn)練來自sklearn中的NearestNeighbors算法。并設(shè)置參數(shù)：metric = 'cosine', algorithm = 'brute'

from sklearn.neighbors import NearestNeighbors
model_knn = NearestNeighbors(metric = 'cosine',
 algorithm = 'brute')
model_knn.fit(movie_features_df_matrix)

現(xiàn)在向模型傳遞一個索引，根據(jù)'kneighbors'算法要求，需要將數(shù)據(jù)轉(zhuǎn)換為單行數(shù)組，并設(shè)置n_neighbors的值。

query_index = np.random.choice(movie_features_df.shape[0])
distances, indices = model_knn.kneighbors(movie_features_df.iloc[query_index,:].values.reshape(1, -1),
n_neighbors = 6)

最后在 query_index 中輸出出電影推薦。

for i in range(0, len(distances.flatten())):
if i == 0:
print('Recommendations for {0}:n'
.format(movie_features_df.index[query_index]))
else:
print('{0}: {1}, with distance of {2}:'
.format(i, movie_features_df.index[indices.flatten()[i]],
distances.flatten()[i]))

Recommendations for Harry Potter and the Order of the Phoenix (2007):

1: Harry Potter and the Half-Blood Prince (2009), with distance of 0.2346513867378235:
2: Harry Potter and the Order of the Phoenix (2007), with distance of 0.3396233320236206:
3: Harry Potter and the Goblet of Fire (2005), with distance of 0.4170845150947571:
4: Harry Potter and the Prisoner of Azkaban (2004), with distance of 0.4499547481536865:
5: Harry Potter and the Chamber of Secrets (2002), with distance of 0.4506162405014038:

感謝各位的閱讀，以上就是“怎么使用Python構(gòu)建電影推薦系統(tǒng)”的內(nèi)容了，經(jīng)過本文的學(xué)習(xí)后，相信大家對怎么使用Python構(gòu)建電影推薦系統(tǒng)這一問題有了更深刻的體會，具體使用情況還需要大家實踐驗證。這里是億速云，小編將為大家推送更多相關(guān)知識點(diǎn)的文章，歡迎關(guān)注！

向AI問一下細(xì)節(jié)

怎么使用Python構(gòu)建電影推薦系統(tǒng)

導(dǎo)入數(shù)據(jù)

導(dǎo)入和合并數(shù)據(jù)集并創(chuàng)建 Pandas DataFrame

添加衍生特征

數(shù)據(jù)可視化

分析

數(shù)據(jù)清洗

創(chuàng)建數(shù)據(jù)透視表

建立 kNN 模型

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽