如何上線部署Pytorch深度學習模型到生產環(huán)境中

發(fā)布時間：2021-12-04 18:11:47 來源：億速云閱讀：354 作者：柒染欄目：大數(shù)據(jù)

這期內容當中小編將會給大家?guī)碛嘘P如何上線部署Pytorch深度學習模型到生產環(huán)境中，文章內容豐富且以專業(yè)的角度為大家分析和敘述，閱讀完這篇文章希望大家可以有所收獲。

Pytorch模型部署準備

Pytorch和TensorFlow是目前使用最廣泛的兩種深度學習框架，在上一篇文章《自動部署深度神經網(wǎng)絡模型TensorFlow（Keras）到生產環(huán)境中》中我們介紹了如何通過AutoDeployAI的AI模型部署和管理系統(tǒng)DaaS（Deployment-as-a-Service）來自動部署TensorFlow模型，本篇我們將介紹如果通過DaaS來自動部署Pytorch深度神經網(wǎng)絡模型，同樣我們需要：

安裝Python DaaS-Client
初始化DaasClient
創(chuàng)建項目

完整的代碼，請參考Github上的Notebook：deploy-pytorch.ipynb

Pytorch自定義運行時

DaaS是基于Kubernetes的AI模型自動部署系統(tǒng)，模型運行在Docker Container中，在DaaS中被稱為運行時（Runtime），有兩類不同的運行時，分別為網(wǎng)絡服務運行環(huán)境（Environment）和任務運行環(huán)境（Worker）。Environment用于創(chuàng)建網(wǎng)絡服務（Web Service），而Worker用于執(zhí)行任務（Job）的部署，比如模型評估和批量預測等。DaaS默認自帶了四套運行時，分別針對Environment和Worker基于不同語言Python2.7和Python3.7，自帶了大部分常用的機器學習和深度學習類庫，但是因為Docker鏡像（Image）大小的緣故，暫時沒有包含Pytorch庫。

DaaS提供了自定義運行時功能，允許用戶把自定義Docker鏡像注冊為Runtime，滿足用戶使用不同模型類型，模型版本的定制需求。下面，我們以部署Pytorch模型為例，詳細介紹如何創(chuàng)建自定義運行時:

1. 構建Docker鏡像：

一般來說，有兩種方式創(chuàng)建Image，一種是通過Dockerfile構建（docker build），一種是通過Container生成（docker commit），這里我們使用第一種方式。無論那一種方式，都需要選定一個基礎鏡像，這里為了方便構建，我們選擇了Pytorch官方鏡像pytorch/pytorch:1.5.1-cuda10.1-cudnn7-runtime。

為了創(chuàng)建網(wǎng)絡服務運行時，除了包含模型運行的依賴類庫外，還需要額外安裝網(wǎng)絡服務的一些基礎庫，完整的列表請參考requirements-service.txt。下載requirements-service.txt文件到當前目錄，創(chuàng)建Dockerfile：

FROM pytorch/pytorch:1.5.1-cuda10.1-cudnn7-runtime

RUN mkdir -p /daas
WORKDIR /daas

COPY requirements-service.txt /daas

RUN pip install -r requirements-service.txt && rm -rf /root/.cache/pip

構建Image：

docker build -f Dockerfile -t pytorch:1.0 .

2. 推送Docker鏡像到Kubernetes中：

構建好的Docker鏡像必須推送到安裝DaaS的Kubernetes環(huán)境能訪問的地方，不同的Kubernetes環(huán)境有不同的Docker鏡像訪問機制，比如本地鏡像，私有或者公有鏡像注冊表（Image Registry）。下面以Daas-MicroK8s為例，它使用的是MicroK8s本地鏡像緩存（Local Images Cache）：

docker save pytorch:1.0 > pytorch.tar
microk8s ctr image import pytorch.tar

3. 創(chuàng)建Pytorch運行時：

登陸DaaS Web頁面后，點擊頂部菜單環(huán)境 / 運行時定義，下面頁面會列出所有的有效運行時，可以看到DaaS自帶的四種運行時：

如何上線部署Pytorch深度學習模型到生產環(huán)境中

點擊按鈕創(chuàng)建運行時，創(chuàng)建基于pytorch:1.0鏡像的Environment運行時:

如何上線部署Pytorch深度學習模型到生產環(huán)境中

默認部署Pytorch模型

訓練Pytorch模型。

使用torchvision中的MNIST數(shù)據(jù)來識別用戶輸入的數(shù)字，以下代碼參考官方實例：Image classification (MNIST) using Convnets。

首先，定義一個無參函數(shù)返回用戶定義模型類（繼承自torch.nn.Module）的一個實例，函數(shù)中包含所有的依賴，可以獨立運行，也就是說包含引入的第三方庫，定義的類、函數(shù)或者變量等等。這是能自動部署Pytorch模型的關鍵。

# Define a function to create an instance of the Net class
def create_net():
    import torch
    import torch.nn as nn  # PyTorch's module wrapper
    import torch.nn.functional as F

    class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            self.conv1 = nn.Conv2d(1, 32, 3, 1)
            self.conv2 = nn.Conv2d(32, 64, 3, 1)
            self.dropout1 = nn.Dropout2d(0.25)
            self.dropout2 = nn.Dropout2d(0.5)
            self.fc1 = nn.Linear(9216, 128)
            self.fc2 = nn.Linear(128, 10)

        def forward(self, x):
            x = self.conv1(x)
            x = F.relu(x)
            x = self.conv2(x)
            x = F.relu(x)
            x = F.max_pool2d(x, 2)
            x = self.dropout1(x)
            x = torch.flatten(x, 1)
            x = self.fc1(x)
            x = F.relu(x)
            x = self.dropout2(x)
            x = self.fc2(x)
            output = F.log_softmax(x, dim=1)
            return output
    return Net()

為了快速訓練出模型，修改epochs=3

import torch
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import StepLR

def train(model, device, train_loader, optimizer, epoch, log_interval):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

use_cuda = torch.cuda.is_available()
batch_size = 64
test_batch_size = 1000
seed = 1234567
lr = 1.0
gamma = 0.7
log_interval = 10
epochs = 3

torch.manual_seed(seed)
device = torch.device("cuda" if use_cuda else "cpu")

kwargs = {
   'batch_size': batch_size}
if use_cuda:
    kwargs.update({
   'num_workers': 1,
                   'pin_memory': True,
                   'shuffle': True},
                  )

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
    ])
dataset1 = datasets.MNIST('./data', train=True, download=True, transform=transform)
dataset2 = datasets.MNIST('./data', train=False, transform=transform)
train_loader = torch.utils.data.DataLoader(dataset1, **kwargs)
test_loader = torch.utils.data.DataLoader(dataset2, **kwargs)

model = create_net().to(device)
optimizer = optim.Adadelta(model.parameters(), lr=lr)

scheduler = StepLR(optimizer, step_size=1, gamma=gamma)

for epoch in range(1, epochs + 1):
    train(model, device, train_loader, optimizer, epoch, log_interval)
    test(model, device, test_loader)
    scheduler.step()

發(fā)布Pytorch模型

模型訓練成功后，通過客戶端publish函數(shù)，發(fā)布模型到DaaS服務器端。通過設置測試數(shù)據(jù)集x_test和y_test，DaaS會自動偵測模型輸入數(shù)據(jù)格式（類型和維數(shù)），挖掘模式（分類或者回歸），評估模型，并且自動存儲x_test中的第一行數(shù)據(jù)作為樣例數(shù)據(jù)，以方便模型測試使用。參數(shù)source_object指定為上面定義的create_net函數(shù)，該函數(shù)代碼會被自動存儲到DaaS系統(tǒng)中。

batch_idx, (x_test, y_test) = next(enumerate(test_loader))

# Publish the built model into DaaS
publish_resp = client.publish(model,
                              name='pytorch-mnist',
                              x_test=x_test,
                              y_test=y_test,
                              source_object=create_net,
                              description='A Pytorch MNIST classification model')
pprint(publish_resp)

結果如下：

{
   'model_name': 'pytorch-mnist', 'model_version': '1'}

測試Pytorch模型

調用test函數(shù)，指定runtime為之前創(chuàng)建的pytorch：

test_resp = client.test(publish_resp['model_name'], 
                        model_version=publish_resp['model_version'],
                        runtime='pytorch')
pprint(test_resp)

返回值test_resp是一個字典類型的結果，記錄了測試API信息，如下：

The runtime "pytorch" is starting
Waiting for it becomes available... 

{'access_token': 'eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJ1aWQiOjEwMDAsInVzZXJuYW1lIjoiYWRtaW4iLCJyb2xlIjoiYWRtaW4iLCJleHAiOjE1OTYwNzkyNzksImlhdCI6MTU5NjAzMjQ3OX0.kLO5R-yiTY6xOo14sAxZGwetQqiq5hDfPs5WZ7epSkDWKeDvyLkVP4VzWQxxlPyUX6SgGeCx0pq-of6SYVLPcOmR54a6W7b4ZfKgllKrssdMqaStclv0S2OFHeVXDIoy4cyoB99MjNaXOc6FCbNB4rae0ufu-eZLLYGlHbvV_c3mJtIIBvMZvonU1WCz6KDU2fEyDOt4hXsqzW4k7IvhyDP2geHWrkk0Jqcob8qag4qCYrNHLWRs8RJXBVXJ1Y9Z5PdhP6CGwt5Qtyf017s7L_BQW3_V9Wq-_qv3_TwcWEyCBTQ45RcCLoqzA-dlCbYgd8seurnI3HlYJZPOcrVY5w',
 'endpoint_url': 'https://192.168.64.7/api/v1/test/deployment-test/pytorch/test',
 'payload': {'args': {'X': [{'tensor_input': [[[[...], [...], ...]]]}],
                      'model_name': 'pytorch-mnist',
                      'model_version': '1'}}}

tensor_input是一個維數(shù)為(1, 1, 28, 28)的嵌套數(shù)組，以上未列出完整的數(shù)據(jù)值。

使用requests庫調用測試API：

response = requests.post(test_resp['endpoint_url'],
                         headers={'Authorization': 'Bearer {token}'.format(token=test_resp['access_token'])},
                         json=test_resp['payload'],
                         verify=False)
pprint(response.json())

返回結果：

{'result': [{'tensor_output': [[-21.444242477416992,
                                -20.39040756225586,
                                -17.134702682495117,
                                -16.960391998291016,
                                -20.394105911254883,
                                -22.380189895629883,
                                -29.211040496826172,
                                -1.311301275563892e-06,
                                -20.16324234008789,
                                -13.592040061950684]]}],
 'stderr': [],
 'stdout': []}

測試結果除了預測值，還包括標準輸出和標準錯誤輸出的日志信息，方便用戶的查看和調試。

驗證測試結果

把預測結果與本地模型結果進行比較：

import numpy as np

desired = model(x_test[[0]]).detach().numpy()
actual = response.json()['result'][0]['tensor_output']
np.testing.assert_almost_equal(actual, desired)

正式部署Pytorch模型

測試成功后，可以進行正式的模型部署。與測試API test 類似，同樣需要指定runtime為之前創(chuàng)建的pytorch。為了提升部署的性能和穩(wěn)定性，可以為運行環(huán)境指定CPU核數(shù)、內存大小以及部署副本數(shù)，這些都可以通過 deploy 函數(shù)參數(shù)設定。

deploy_resp = client.deploy(model_name=publish_resp['model_name'], 
                            deployment_name=publish_resp['model_name'] + '-svc',
                            model_version=publish_resp['model_version'],
                            runtime='pytorch')
pprint(deploy_resp)

返回結果：

The deployment "pytorch-mnist-svc" created successfully
Waiting for it becomes available... 

{'access_token': 'eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJ1aWQiOjEwMDAsInVzZXJuYW1lIjoiYWRtaW4iLCJyb2xlIjoiYWRtaW4iLCJwcm9qZWN0TmFtZSI6Ilx1OTBlOFx1N2Y3Mlx1NmQ0Ylx1OGJkNSIsInByb2plY3RMYWJlbCI6ImRlcGxveW1lbnQtdGVzdCIsImlhdCI6MTU5NjAyODU2N30.iBGyYxCjD5mB_o2IbMkSKRlx9YVvfE3Ih-6LOE-cmp9VoDde-t3JLcDdS3Fg7vyVSIbre6XmYDQ_6IDjzy8XEOzxuxxdhwFPnW8Si1P-fbln5HkPhbDukImShM5ZAcfmD6fNWbz2S0JIgs8rM15d1WKGTC3n9yaXiVumWV1lTKImhl1tBF4ay_6YdCqKmLsrLX6UqbcZA5ZTqHaAG76xgK9vSo1aOOstKLTcloEkswpuMtkYo6ByouLznqQ_yklAYTthdrKX623OJdO3__DOkULq8E-am_c6R7FtyRvYwr4O5BKeHjKCxY6pHmc6PI4Yyyd_TJUTbNPX9fPxhZ4CRg',
 'endpoint_url': 'https://192.168.64.7/api/v1/svc/deployment-test/pytorch-mnist-svc/predict',
 'payload': {'args': {'X': [{'tensor_input': [[[[...],[...],...]]]}]}}}

使用requests庫調用正式API：

response = requests.post(deploy_resp['endpoint_url'],
                         headers={'Authorization': 'Bearer {token}'.format(token=deploy_resp['access_token'])},
                         json=deploy_resp['payload'],
                         verify=False)
pprint(response.json())

結果如下：

{'result': [{'tensor_output': [[-21.444242477416992,
                                -20.39040756225586,
                                -17.134702682495117,
                                -16.960391998291016,
                                -20.394105911254883,
                                -22.380189895629883,
                                -29.211040496826172,
                                -1.311301275563892e-06,
                                -20.16324234008789,
                                -13.592040061950684]]}]}

正式部署結果和測試結果是相同的，除了通過DaaS-Client客戶端程序，模型測試和模型部署，也可以在DaaS Web客戶端完成，這里就不再贅述。

自定義部署Pytorch模型

在上面的默認模型部署中，我們看到模型的輸入數(shù)據(jù)是維數(shù)為(, 1, 28, 28)的張量（Tensor），輸出結果是(, 10)的張量，客戶端調用部署REST API時，必須進行數(shù)據(jù)預處理和結果后處理，包括讀取圖像文件，轉換成需要的張量格式，并且調用和模型訓練相同的數(shù)據(jù)變換，比如上面的歸一化操作（Normalize），最后通過張量結果計算出最終識別出的數(shù)字。

為了減輕客戶端的負擔，我們希望這些操作都能在部署服務器端完成，客戶端直接輸入圖像，服務器端直接返回最終的識別數(shù)字。在DaaS中，可以通過模型自定義部署功能來滿足以上需求，它允許用戶自由添加任意的數(shù)據(jù)預處理和后處理操作，下面我們詳細介紹如何自定義部署上面的Pytorch模型。

登陸DaaS Web客戶端，查看pytorch-mnist模型信息：

如何上線部署Pytorch深度學習模型到生產環(huán)境中

切換到實時預測標簽頁，點擊命令生成自定義實時預測腳本，生成預定義腳本：

如何上線部署Pytorch深度學習模型到生產環(huán)境中

我們看到函數(shù)create_net內容會被自動寫入到生成的預測腳本中，點擊命令高級設置，選擇網(wǎng)絡服務運行環(huán)境為pytorch：

如何上線部署Pytorch深度學習模型到生產環(huán)境中

點擊作為API測試命令，頁面切換到測試頁面，修改preprocess_files函數(shù)，引入模型訓練時的圖像處理操作：

def preprocess_files(args):
    """preprocess the uploaded files"""
    files = args.get('files')
    if files is not None:
        # get the first record object in X if it's present
        if 'X' in args:
            record = args['X'][0]
        else:
            record = {}
            args['X'] = [record]
        
        from torchvision import transforms
        transform = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,))
            ])

        import numpy as np
        from PIL import Image
        for key, file in files.items():
            img = Image.open(file)
            normed = transform(img)
            record[key] = normed.numpy()

    return args

完成后，輸入函數(shù)名predict，選擇請求正文基于表單，輸入名稱tensor_input，選擇文件，點擊上傳測試圖像test.png（該圖像為上面測試使用的數(shù)據(jù)），點擊提交，右側響應頁面將會顯示預測結果：

如何上線部署Pytorch深度學習模型到生產環(huán)境中

可以看到，結果與默認部署輸出相同。繼續(xù)修改postprocess函數(shù)為：

def postprocess(result):
    """postprocess the predicted results"""
    import numpy as np
    return [int(np.argmax(np.array(result).squeeze(), axis=0))]

重新提交，右側響應頁面顯示結果為：

如何上線部署Pytorch深度學習模型到生產環(huán)境中

測試完成后，可以創(chuàng)建正式的部署，切換到部署標簽頁，點擊命令添加網(wǎng)絡服務，輸入服務名稱pytorch-mnist-custom-svc，網(wǎng)絡服務運行環(huán)境選擇pytorch，其他使用默認選項，點擊創(chuàng)建。進入到部署頁面后，點擊測試標簽頁，該界面類似之前的腳本測試界面，輸入函數(shù)名predict，請求正文選擇基于表單，輸入名稱tensor_input，類型選擇文件，點擊上傳測試的圖片后，點擊提交：

如何上線部署Pytorch深度學習模型到生產環(huán)境中

到此，正式部署已經測試和創(chuàng)建完成，用戶可以使用任意的客戶端程序調用該部署服務。點擊以上界面中的生成代碼命令，顯示如何通過curl命令調用該服務，測試如下：

如何上線部署Pytorch深度學習模型到生產環(huán)境中

通過ONNX部署Pytorch模型

除了通過以上的原生部署，Pytorch庫本身支持導出ONNX格式，所以通過ONNX來部署Pytorch模型是另一個選擇，ONNX部署的優(yōu)勢是模型部署不再需要依賴Pytorch庫，也就是不需要創(chuàng)建上面的pytorch運行時。可以使用DaaS默認自帶的運行時Python 3.7 - Function as a Service，它包含了ONNX Runtime CPU版本用于支持ONNX模型預測。

轉換Pytorch模型到ONNX：

# Export the model
torch.onnx.export(model,                     # model being run
                  x_test[[0]],               # model input (or a tuple for multiple inputs)
                  'mnist.onnx',              # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=10,          # the ONNX version to export the model to
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['tensor_input'],   # the model's input names
                  output_names = ['tensor_output'], # the model's output names
                  dynamic_axes={
   'input' : {
   0 : 'batch_size'},    # variable lenght axes
                                'output' : {
   0 : 'batch_size'}}
                  )

發(fā)布ONNX模型：

publish_resp = client.publish('mnist.onnx',
                              name='pytorch-mnist-onnx',
                              x_test=x_test,
                              y_test=y_test,
                              description='A Pytorch MNIST classification model in ONNX')
pprint(publish_resp)

結果如下：

{
   'model_name': 'pytorch-mnist-onnx', 'model_version': '1'}

測試ONNX模型

上面，我們通過客戶端的test函數(shù)來進行模型測試，這里我們使用另一個方式，在DaaS Web頁面中測試模型。登陸DaaS Web客戶端，進入pytorch-mnist-onnx模型頁面，切換到測試標簽頁，我們看到DaaS自動存儲了一條測試數(shù)據(jù)，點擊提交命令，測試該條數(shù)據(jù)，如圖：

如何上線部署Pytorch深度學習模型到生產環(huán)境中

我們看到，該ONNX模型和原生Pytorch模型測試結果是一致的。

默認部署和自定義部署ONNX模型

關于在DaaS Web界面中如何為ONNX模型創(chuàng)建默認部署和自定義部署，請參考文章《使用ONNX部署深度學習和傳統(tǒng)機器學習模型》，流程相同，就不再這里贅述。

試用DaaS(Deployment-as-a-Service)

本文中，我們介紹了在DaaS中如何原生部署Pytorch模型，整個流程非常簡單，對于默認部署，只是簡單調用幾個API就可以完成模型的部署，而對于自定義部署，DaaS提供了方便的測試界面，可以隨時程序修改腳本進行測試，調試成功后再創(chuàng)建正式部署。在現(xiàn)實的部署中，為了獲取更高的預測性能，用戶需要更多的修改自定義預測腳本，比如更優(yōu)的數(shù)據(jù)處理，使用GPU等。DaaS提供了簡單易用的部署框架允許用戶自由的定制和擴展。

如果您想體驗DaaS模型自動部署系統(tǒng)，或者通過我們的云端SaaS服務，或者本地部署，請發(fā)送郵件到 autodeploy.ai#outlook.com（# 替換為 @），并說明一下您的模型部署需求。

上述就是小編為大家分享的如何上線部署Pytorch深度學習模型到生產環(huán)境中了，如果剛好有類似的疑惑，不妨參照上述分析進行理解。如果想知道更多相關知識，歡迎關注億速云行業(yè)資訊頻道。

向AI問一下細節(jié)

如何上線部署Pytorch深度學習模型到生產環(huán)境中

Pytorch模型部署準備

Pytorch自定義運行時

1. 構建Docker鏡像：

2. 推送Docker鏡像到Kubernetes中：

3. 創(chuàng)建Pytorch運行時：

默認部署Pytorch模型

訓練Pytorch模型。

發(fā)布Pytorch模型

測試Pytorch模型

驗證測試結果

正式部署Pytorch模型

自定義部署Pytorch模型

通過ONNX部署Pytorch模型

轉換Pytorch模型到ONNX：

發(fā)布ONNX模型：

測試ONNX模型

默認部署和自定義部署ONNX模型

試用DaaS(Deployment-as-a-Service)

猜你喜歡

最新資訊

相關推薦

相關標簽

訓練Pytorch模型。