Python怎么用CNN實現(xiàn)對時序數(shù)據(jù)進行分類

發(fā)布時間：2023-02-22 16:17:09 來源：億速云閱讀：143 作者：iii 欄目：開發(fā)技術(shù)

這篇“Python怎么用CNN實現(xiàn)對時序數(shù)據(jù)進行分類”文章的知識點大部分人都不太理解，所以小編給大家總結(jié)了以下內(nèi)容，內(nèi)容詳細(xì)，步驟清晰，具有一定的借鑒價值，希望大家閱讀完這篇文章能有所收獲，下面我們一起來看看這篇“Python怎么用CNN實現(xiàn)對時序數(shù)據(jù)進行分類”文章吧。

一、數(shù)據(jù)集介紹

數(shù)據(jù)集利用的是CPSC2020數(shù)據(jù)集。

訓(xùn)練數(shù)據(jù)包括從心律失?；颊呤占?0個單導(dǎo)聯(lián)心電圖記錄，每個記錄持續(xù)約24小時。

Python怎么用CNN實現(xiàn)對時序數(shù)據(jù)進行分類

下載完成后的TrainingSet數(shù)據(jù)集包括兩個文件夾，分別是data和ref。data和ref文件夾內(nèi)分別有10個mat文件。

data文件夾存儲數(shù)據(jù)文件，每個文件以mat格式存儲，n &lowast; 1 n*1n&lowast;1數(shù)組表示；
ref文件夾為標(biāo)簽文件夾，每個文件以mat文件存儲，結(jié)構(gòu)體存儲，包括S_ref,V_ref兩個n*1數(shù)組，分別存儲對應(yīng)標(biāo)簽(S,V)的位置；

采樣率為 400。

S：室上早搏（SPB）；
V：心室早搏（PVC）；

二、數(shù)據(jù)預(yù)處理

2.1 獲取原始數(shù)據(jù)

查看一下前1000個心電圖數(shù)據(jù)：

datafile = 'E:/Wendy/Desktop/TrainingSet/data/A04.mat'# 采樣率400
data = scio.loadmat(datafile)
#rint(data) # dict

sig = data['ecg']# (x,1)
#print(sig)
sig = np.reshape(sig,(-1)) # (x,)轉(zhuǎn)換為一維向量
print(sig)
sigPlot = sig[1:5*200]# # 獲取前1000個信號
fig = plt.figure(figsize=(20, 10),dpi=400)
plt.plot(sigPlot)
plt.show()

運行結(jié)果：

Python怎么用CNN實現(xiàn)對時序數(shù)據(jù)進行分類

2.2 獲取原始標(biāo)簽

將標(biāo)簽數(shù)據(jù)轉(zhuǎn)化為一維向量

datafile = 'E:/Wendy/Desktop/TrainingSet/ref/R04.mat'# 采樣率400
data = scio.loadmat(datafile)
#print(data)
label = data['ref'][0][0]
S_ref = label[0];
S_ref = np.reshape(S_ref,(-1)) # 轉(zhuǎn)換為一維向量
V_ref = label[1];
V_ref = np.reshape(V_ref,(-1)) # 轉(zhuǎn)換為一維向量

2.3 數(shù)據(jù)分割

數(shù)據(jù)分割為5s一個片段

思路：房早室早心拍和前后兩個心拍均有關(guān)系，按照平均心率72計算，平均每個心拍的時間為60/72，因此5個心拍的時間為60/725=4.1667 4.1667s不好計算，故選擇5s 5 ( 秒 ) s a m p r = 5 &lowast; 400 = 2000 個 s a m p l e 5(秒)sampr = 5*400=2000個sample5(秒)sampr=5&lowast;400=2000個sample

定義標(biāo)簽：0：其他；1：V_ref; 2:S_ref;

a = len(sig)
Fs = 400 # 采樣率為400
segLen = 5*Fs # 2000
num = int(a/segLen)
print(num)

運行結(jié)果：

17650

其中Fs為采樣率，segLen為片段長度，num為片段數(shù)量。

2.4 整合數(shù)據(jù)和標(biāo)簽

接下來需要整合數(shù)據(jù)和標(biāo)簽：

all_data=[]
all_label = [];
i=1
while i<num+1: 
    all_data.append(np.array(sig[(i-1)*segLen:i*segLen]))
    # 標(biāo)簽
    if set(S_ref) & set(range((i-1)*segLen,i*segLen)):
        all_label.append(2)
    elif set(V_ref) & set(range((i-1)*segLen,i*segLen)):
        all_label.append(1)        
    else:
        all_label.append(0)    
    i=i+1
type(all_data)# list類型
type(all_label)# list類型
print((np.array(all_data)).shape) # 17650為數(shù)據(jù)長度,2000為數(shù)據(jù)個數(shù)
print((np.array(all_label)).shape)
#print(all_data)

運行結(jié)果：

(17650, 2000)
(17650,)

17650為數(shù)據(jù)長度，2000為數(shù)據(jù)個數(shù)。

2.5 保存

將數(shù)據(jù)保存為字典類型：

import pickle
res = {'data':all_data, 'label':all_label} # 字典類型dict
with open('./cpsc2020.pkl', 'wb') as fout: # #將結(jié)果保存為cpsc2020.pkl
    pickle.dump(res, fout)

三、數(shù)據(jù)訓(xùn)練

3.1 讀取數(shù)據(jù)并進行處理

將數(shù)據(jù)歸一化并進行標(biāo)簽編碼，劃分訓(xùn)練集和測試集，訓(xùn)練集為90%，測試集為10%，打亂數(shù)據(jù)并將其擴展為二維：

import numpy as np
import pandas as pd
import scipy.io
from matplotlib import pyplot as plt
import pickle
from sklearn.model_selection import train_test_split
from collections import Counter
from tqdm import tqdm

def read_data_physionet():
    """
    only N V, S
    """
    # read pkl
    with open('./cpsc2020.pkl', 'rb') as fin:
        res = pickle.load(fin) # 加載數(shù)據(jù)集
    ## 數(shù)據(jù)歸一化
    all_data = res['data']
    for i in range(len(all_data)):
        tmp_data = all_data[i]
        tmp_std = np.std(tmp_data) # 獲取數(shù)據(jù)標(biāo)準(zhǔn)差
        tmp_mean = np.mean(tmp_data) # 獲取數(shù)據(jù)均值
        if(tmp_std==0):   # i=1239-1271均為0
            tmp_std = 1 
        all_data[i] = (tmp_data - tmp_mean) / tmp_std  # 歸一化
    all_data = []
    ## 標(biāo)簽編碼
    all_label = []
    for i in range(len(res['label'])):
        if res['label'][i] == 1:
            all_label.append(1)
            all_data.append(res['data'][i])
        elif res['label'][i] == 2:
            all_label.append(2)
            all_data.append(res['data'][i])
        else:
            all_label.append(0)
            all_data.append(res['data'][i])       
    all_label = np.array(all_label)
    all_data = np.array(all_data)

    # 劃分訓(xùn)練集和測試集,訓(xùn)練集90%，測試集10%
    X_train, X_test, Y_train, Y_test = train_test_split(all_data, all_label, test_size=0.1, random_state=15)
    
 
    print('訓(xùn)練集和測試集中 其他類別(0)；室早(1)；房早(2)的數(shù)量: ')
    print(Counter(Y_train), Counter(Y_test))
    
    # 打亂訓(xùn)練集
    shuffle_pid = np.random.permutation(Y_train.shape[0])
    X_train = X_train[shuffle_pid]
    Y_train = Y_train[shuffle_pid]

    # 擴展為二維(x,1)
    X_train = np.expand_dims(X_train, 1)
    X_test = np.expand_dims(X_test, 1)

    return X_train, X_test, Y_train, Y_test
X_train, X_test, Y_train, Y_test = read_data_physionet()

運行結(jié)果：

訓(xùn)練集和測試集中其他類別(0)；室早(1)；房早(2)的數(shù)量:
Counter({1: 8741, 0: 4605, 2: 2539}) Counter({1: 1012, 0: 478, 2: 275})

3.2 構(gòu)建數(shù)據(jù)結(jié)構(gòu)

自行構(gòu)建數(shù)據(jù)集：

# 構(gòu)建數(shù)據(jù)結(jié)構(gòu) MyDataset
# 單條數(shù)據(jù)信號的形狀為：1*2000
import numpy as np
from collections import Counter
from tqdm import tqdm
from matplotlib import pyplot as plt
from sklearn.metrics import classification_report 

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

class MyDataset(Dataset):
    def __init__(self, data, label):
        self.data = data
        self.label = label
    #把numpy轉(zhuǎn)換為Tensor
    def __getitem__(self, index):
        return (torch.tensor(self.data[index], dtype=torch.float), torch.tensor(self.label[index], dtype=torch.long))

    def __len__(self):
        return len(self.data)

3.3 搭建神經(jīng)網(wǎng)絡(luò)

搭建CNN網(wǎng)絡(luò)結(jié)構(gòu)：

# 搭建神經(jīng)網(wǎng)絡(luò)
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Sequential(         # input shape (1, 1, 2000)
            nn.Conv1d(
                in_channels=1,
                out_channels=16,
                kernel_size=5,
                stride=1, 
                padding=2,
            ),                              # output shape (16, 1, 2000)
            nn.Dropout(0.2),
            nn.ReLU(),
            nn.MaxPool1d(kernel_size=5),    # choose max value in 1x5 area, output shape (16, 1, 400)2000/5
        )
        self.conv2 = nn.Sequential(         # input shape (16, 1, 400)
            nn.Conv1d(16, 32, 5, 1, 2),     # output shape (32, 1, 400)
            nn.Dropout(0.2),
            nn.ReLU(),
            nn.MaxPool1d(kernel_size=5),    # output shape (32, 1, 400/5=80)
        )
        self.out = nn.Linear(32 *  80, 3)   # fully connected layer, output 3 classes

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = x.view(x.size(0), -1)
        output = self.out(x)
        #output.Softmax()
        return output, x
cnn = CNN()
print(cnn)

運行結(jié)果：

CNN(
(conv1): Sequential(
(0): Conv1d(1, 16, kernel_size=(5,), stride=(1,), padding=(2,))
(1): Dropout(p=0.2, inplace=False)
(2): ReLU()
(3): MaxPool1d(kernel_size=5, stride=5, padding=0, dilation=1, ceil_mode=False)
)
(conv2): Sequential(
(0): Conv1d(16, 32, kernel_size=(5,), stride=(1,), padding=(2,))
(1): Dropout(p=0.2, inplace=False)
(2): ReLU()
(3): MaxPool1d(kernel_size=5, stride=5, padding=0, dilation=1, ceil_mode=False)
)
(out): Linear(in_features=2560, out_features=3, bias=True)
)

3.4 開始訓(xùn)練

優(yōu)化器利用的是Adam優(yōu)化器，損失函數(shù)使用crossEntropy函數(shù)。

代碼略

50個epoch的運行效果如下：

Python怎么用CNN實現(xiàn)對時序數(shù)據(jù)進行分類

以上就是關(guān)于“Python怎么用CNN實現(xiàn)對時序數(shù)據(jù)進行分類”這篇文章的內(nèi)容，相信大家都有了一定的了解，希望小編分享的內(nèi)容對大家有幫助，若想了解更多相關(guān)的知識內(nèi)容，請關(guān)注億速云行業(yè)資訊頻道。

向AI問一下細(xì)節(jié)

Python怎么用CNN實現(xiàn)對時序數(shù)據(jù)進行分類

一、數(shù)據(jù)集介紹

二、數(shù)據(jù)預(yù)處理

2.1 獲取原始數(shù)據(jù)

2.2 獲取原始標(biāo)簽

2.3 數(shù)據(jù)分割

2.4 整合數(shù)據(jù)和標(biāo)簽

2.5 保存

三、數(shù)據(jù)訓(xùn)練

3.1 讀取數(shù)據(jù)并進行處理

3.2 構(gòu)建數(shù)據(jù)結(jié)構(gòu)

3.3 搭建神經(jīng)網(wǎng)絡(luò)

3.4 開始訓(xùn)練

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽

一、數(shù)據(jù)集介紹

二、數(shù)據(jù)預(yù)處理

三、數(shù)據(jù)訓(xùn)練