基于Transformer怎么實(shí)現(xiàn)電影評(píng)論星級(jí)分類任務(wù)

發(fā)布時(shí)間：2023-05-05 15:13:06 來源：億速云閱讀：104 作者：iii 欄目：開發(fā)技術(shù)

這篇文章主要講解了“基于Transformer怎么實(shí)現(xiàn)電影評(píng)論星級(jí)分類任務(wù)”，文中的講解內(nèi)容簡(jiǎn)單清晰，易于學(xué)習(xí)與理解，下面請(qǐng)大家跟著小編的思路慢慢深入，一起來研究和學(xué)習(xí)“基于Transformer怎么實(shí)現(xiàn)電影評(píng)論星級(jí)分類任務(wù)”吧！

Transformer模型概述

Transformer是一種用于序列到序列學(xué)習(xí)的神經(jīng)網(wǎng)絡(luò)架構(gòu)，專門用于處理輸入和輸出序列之間的依賴關(guān)系。該模型被廣泛應(yīng)用于機(jī)器翻譯、音頻轉(zhuǎn)錄、語言生成等多個(gè)自然語言處理領(lǐng)域。

Transformer基于attention機(jī)制來實(shí)現(xiàn)序列到序列的學(xué)習(xí)。在RNN（循環(huán)神經(jīng)網(wǎng)絡(luò)）中，網(wǎng)絡(luò)必須按順序遍歷每個(gè)單詞，并在每個(gè)時(shí)間步計(jì)算隱層表示。這樣，在長(zhǎng)段文本中，信息可能會(huì)從網(wǎng)絡(luò)的起點(diǎn)傳遞到終點(diǎn)，這導(dǎo)致了難以捕捉遠(yuǎn)距離依賴關(guān)系的問題。而attention機(jī)制可以根據(jù)輸入序列中的詞與其它所有詞的相關(guān)性分配不同的權(quán)重，從而突破了序列到序列中的局限。

具體來說，一個(gè)Transformer模型由編碼器（encoder）和解碼器（decoder）兩部分組成。編碼器用于接收輸入序列，解碼器用于生成輸出序列。每個(gè)編碼器和解碼器均包含多頭attention機(jī)制、前饋網(wǎng)絡(luò)以及殘差連接等組件。

在一個(gè)典型的Transformer模型中，首先將輸入序列通過嵌入層進(jìn)行向量化，然后將向量表示作為Transformer的第一層輸入。處理完輸入向量之后，下一層就是多頭attention層，其中每個(gè)頭（head）都可以計(jì)算出不同的注意力權(quán)重向量（也稱為attention mask）。最后，利用殘差連接和skip connection機(jī)制使transformer更易于訓(xùn)練。

數(shù)據(jù)集準(zhǔn)備

在此任務(wù)中，我們將使用來自IMDB的電影評(píng)論數(shù)據(jù)集，該數(shù)據(jù)集包含50,000條有標(biāo)簽的電影評(píng)論，每個(gè)評(píng)論標(biāo)記為正面或負(fù)面情感。其中25,000個(gè)用于訓(xùn)練，另外25,000個(gè)用于測(cè)試。

由于Transformer是對(duì)token進(jìn)行操作，所以我們需要對(duì)文本的每個(gè)單詞進(jìn)行編碼。一種常用的方法是使用Bert Tokenizer。GPT-2等預(yù)訓(xùn)練模型會(huì)使用特定的tokenizer。選擇最新版本的transformers包可以快速實(shí)現(xiàn)這些操作：

!pip install transformers

接著加載tokenizer：

from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

上述操作將下載并加載適用于bert的tokenizer。下一步是讀取IMDB數(shù)據(jù)集的內(nèi)容。在本文中，我們將使用此處的已處理好的CSV形式數(shù)據(jù)：drive.google.com/file/d/1b_b…

import pandas as pd
train_df = pd.read_csv('imdb_train.csv')
test_df = pd.read_csv('imdb_test.csv')

由于Transformer模型需要固定長(zhǎng)度的輸入序列，我們選擇了max_length為100并對(duì)所有評(píng)論進(jìn)行padding操作：

train_inputs = tokenizer(list(train_df['review']), padding=True, truncation=True, max_length=100)
test_inputs = tokenizer(list(test_df['review']), padding=True, truncation=True, max_length=100)

現(xiàn)在我們可以將輸入和標(biāo)簽分別轉(zhuǎn)換成torch Tensor類型：

import torch
train_labels = torch.tensor(list(train_df['sentiment'].replace({'pos': 1, 'neg':0})))
test_labels = torch.tensor(list(test_df['sentiment'].replace({'pos': 1, 'neg':0})))

train_encoded_dict = {
    'input_ids': torch.tensor(train_inputs['input_ids']),
    'token_type_ids': torch.tensor(train_inputs['token_type_ids']),
    'attention_mask': torch.tensor(train_inputs['attention_mask']),
    'labels': train_labels
}

test_encoded_dict = {
    'input_ids': torch.tensor(test_inputs['input_ids']),
    'token_type_ids': torch.tensor(test_inputs['token_type_ids']),
    'attention_mask': torch.tensor(test_inputs['attention_mask']),
    'labels': test_labels
}

模型訓(xùn)練

在此任務(wù)中，我們將使用PyTorch庫實(shí)現(xiàn)Transformer模型。 PyTorch是一種基于Python的科學(xué)計(jì)算包，其靈活性和易用性使其成為深度學(xué)習(xí)領(lǐng)域最常用的庫之一。

可以使用Hugging Face的Transformers實(shí)現(xiàn)預(yù)先訓(xùn)練好的BERT模型：

from transformers import BertForSequenceClassification, AdamW, BertConfig
model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    num_labels = 2,
    output_attentions = False,
    output_hidden_states = False,
)

然后，我們需要定義優(yōu)化器、損失函數(shù)和批大小等訓(xùn)練超參數(shù)：

optimizer = AdamW(model.parameters(), lr = 2e-5, eps = 1e-8)

from torch.utils.data import DataLoader, RandomSampler, SequentialSampler
batch_size = 32
train_dataloader = DataLoader(train_encoded_dict, sampler = RandomSampler(train_encoded_dict), batch_size = batch_size)
test_dataloader = DataLoader(test_encoded_dict, sampler = SequentialSampler(test_encoded_dict), batch_size = batch_size)

from transformers import get_linear_schedule_with_warmup
epochs = 4
total_steps = len(train_dataloader) * epochs
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps = 0, num_training_steps = total_steps)
loss_fn = torch.nn.CrossEntropyLoss()

最后，我們可以定義模型的訓(xùn)練過程，并進(jìn)行模型訓(xùn)練：

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.train()

total_train_loss = 0
for epoch_i in range(epochs):
    print(f"{'':^5}Epoch:{epoch_i + 1:^3}")
    for step, batch in enumerate(train_dataloader):
        b_input_ids = batch['input_ids'].to(device)
        b_token_type_ids = batch['token_type_ids'].to(device)
        b_attention_mask = batch['attention_mask'].to(device)
        b_labels = batch['labels'].to(device)

        model.zero_grad()

        outputs = model(b_input_ids,
                        token_type_ids=b_token_type_ids, 
                        attention_mask=b_attention_mask, 
                        labels=b_labels)
        
        loss = outputs.loss
        total_train_loss += loss.item()
    
        loss.backward()

        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

        optimizer.step()

        scheduler.step()

    avg_train_loss = total_train_loss / len(train_dataloader)            
    print("   Average training loss: {avg_train_loss:.2f}")

def evaluate(model, test_dataloader):
    model.eval()

    total_eval_accuracy = 0
    total_eval_loss = 0
    nb_eval_steps = 0

    for batch in test_dataloader:
        b_input_ids = batch['input_ids'].to(device)
        b_token_type_ids = batch['token_type_ids'].to(device)
        b_attention_mask = batch['attention_mask'].to(device)
        b_labels = batch['labels'].to(device)

        with torch.no_grad():       
            outputs = model(b_input_ids, 
                            token_type_ids=b_token_type_ids, 
                            attention_mask=b_attention_mask,
                            labels=b_labels)
        loss = outputs.loss
        logits = outputs.logits

        total_eval_loss += loss.item()
        logits = logits.detach().cpu().numpy()
        label_ids = b_labels.to('cpu').numpy()
        total_eval_accuracy += flat_accuracy(logits, label_ids)

    avg_val_accuracy = total_eval_accuracy / len(test_dataloader)
    avg_val_loss = total_eval_loss / len(test_dataloader)

    return avg_val_accuracy, avg_val_loss

accuracy, val_loss = evaluate(model, test_dataloader)
print(f'Accuracy: {accuracy:.2f}%')

訓(xùn)練結(jié)束后，我們可以使用測(cè)試集對(duì)模型進(jìn)行評(píng)估。TensorFlow提供了非常好的評(píng)估函數(shù)可以在別人的工程稍微改下直接拿來用：

from sklearn.metrics import accuracy_score
def flat_accuracy(preds, labels):
    pred_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    return accuracy_score(labels_flat, pred_flat)

模型調(diào)整和優(yōu)化

下面是一些可能有助于Transformer模型性能的調(diào)整和優(yōu)化方法。

（1）最大輸入序列長(zhǎng)度： Transformer模型需要固定大小的輸入序列。在IMDB任務(wù)中，我們將max_length設(shè)置為100。調(diào)整這個(gè)參數(shù)會(huì)影響到模型的性能，長(zhǎng)時(shí)間耗時(shí)與顯存限制等都會(huì)影響選擇。

（2）學(xué)習(xí)率、批大小、迭代次數(shù)等訓(xùn)練超參數(shù)的調(diào)整：常用策略包括指數(shù)衰減學(xué)習(xí)率、增加批次大小、增加迭代次數(shù)等。

（3）使用預(yù)訓(xùn)練模型：隨著語言模型的發(fā)展，預(yù)訓(xùn)練語言模型在各種NLP任務(wù)中表現(xiàn)越來越好。因此，在這類任務(wù)中，可以通過使用預(yù)訓(xùn)練的模型來提高準(zhǔn)確性。適合使用這個(gè)方法的數(shù)據(jù)集規(guī)模越大，效果越明顯。

（4）模型融合或集成：許多競(jìng)賽中，采用模型平均等方式提高模型的完整性和穩(wěn)健性。在結(jié)果更重要的大賽中尤為突出。

感謝各位的閱讀，以上就是“基于Transformer怎么實(shí)現(xiàn)電影評(píng)論星級(jí)分類任務(wù)”的內(nèi)容了，經(jīng)過本文的學(xué)習(xí)后，相信大家對(duì)基于Transformer怎么實(shí)現(xiàn)電影評(píng)論星級(jí)分類任務(wù)這一問題有了更深刻的體會(huì)，具體使用情況還需要大家實(shí)踐驗(yàn)證。這里是億速云，小編將為大家推送更多相關(guān)知識(shí)點(diǎn)的文章，歡迎關(guān)注！

向AI問一下細(xì)節(jié)

基于Transformer怎么實(shí)現(xiàn)電影評(píng)論星級(jí)分類任務(wù)

Transformer模型概述

數(shù)據(jù)集準(zhǔn)備

模型訓(xùn)練

模型調(diào)整和優(yōu)化

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽