<tt id="ghu5l"><button id="ghu5l"></button></tt>

<address id="ghu5l"></address>

^{<td id="ghu5l"></td>}

溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊×

獲取短信驗證碼

其他方式登錄

點擊登錄注冊即表示同意《億速云用戶服務(wù)條款》

用戶登錄×

賬戶密碼登錄

請使用微信掃描上方二維碼

使用幫助

請求超時！

請點擊重新獲取二維碼

pytorch 自定義數(shù)據(jù)集加載方法

發(fā)布時間：2020-08-31 19:17:51 來源：腳本之家閱讀：237 作者：xholes 欄目：開發(fā)技術(shù)

pytorch 官網(wǎng)給出的例子中都是使用了已經(jīng)定義好的特殊數(shù)據(jù)集接口來加載數(shù)據(jù)，而且其使用的數(shù)據(jù)都是官方給出的數(shù)據(jù)。如果我們有自己收集的數(shù)據(jù)集，如何用來訓練網(wǎng)絡(luò)呢？此時需要我們自己定義好數(shù)據(jù)處理接口。幸運的是pytroch給出了一個數(shù)據(jù)集接口類(torch.utils.data.Dataset)，可以方便我們繼承并實現(xiàn)自己的數(shù)據(jù)集接口。

torch.utils.data

torch的這個文件包含了一些關(guān)于數(shù)據(jù)集處理的類。

class torch.utils.data.Dataset: 一個抽象類，所有其他類的數(shù)據(jù)集類都應(yīng)該是它的子類。而且其子類必須重載兩個重要的函數(shù)：len(提供數(shù)據(jù)集的大?。?、getitem(支持整數(shù)索引)。

class torch.utils.data.TensorDataset: 封裝成tensor的數(shù)據(jù)集，每一個樣本都通過索引張量來獲得。

class torch.utils.data.ConcatDataset: 連接不同的數(shù)據(jù)集以構(gòu)成更大的新數(shù)據(jù)集。

class torch.utils.data.Subset(dataset, indices): 獲取指定一個索引序列對應(yīng)的子數(shù)據(jù)集。

class torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False, sampler=None, batch_sampler=None, num_workers=0, collate_fn=<function default_collate>, pin_memory=False, drop_last=False, timeout=0, worker_init_fn=None): 數(shù)據(jù)加載器。組合了一個數(shù)據(jù)集和采樣器，并提供關(guān)于數(shù)據(jù)的迭代器。

torch.utils.data.random_split(dataset, lengths): 按照給定的長度將數(shù)據(jù)集劃分成沒有重疊的新數(shù)據(jù)集組合。

class torch.utils.data.Sampler(data_source):所有采樣的器的基類。每個采樣器子類都需要提供 __iter__ 方法以方便迭代器進行索引和一個 len方法以方便返回迭代器的長度。

class torch.utils.data.SequentialSampler(data_source):順序采樣樣本，始終按照同一個順序。

class torch.utils.data.RandomSampler(data_source):無放回地隨機采樣樣本元素。

class torch.utils.data.SubsetRandomSampler(indices)：無放回地按照給定的索引列表采樣樣本元素。

class torch.utils.data.WeightedRandomSampler(weights, num_samples, replacement=True): 按照給定的概率來采樣樣本。

class torch.utils.data.BatchSampler(sampler, batch_size, drop_last): 在一個batch中封裝一個其他的采樣器。

class torch.utils.data.distributed.DistributedSampler(dataset, num_replicas=None, rank=None):采樣器可以約束數(shù)據(jù)加載進數(shù)據(jù)集的子集。

自定義數(shù)據(jù)集

自己定義的數(shù)據(jù)集需要繼承抽象類class torch.utils.data.Dataset，并且需要重載兩個重要的函數(shù)：__len__ 和__getitem__。

整個代碼僅供參考。在__init__中是初始化了該類的一些基本參數(shù)；__getitem__中是真正讀取數(shù)據(jù)的地方，迭代器通過索引來讀取數(shù)據(jù)集中數(shù)據(jù)，因此只需要這一個方法中加入讀取數(shù)據(jù)的相關(guān)功能即可；__len__給出了整個數(shù)據(jù)集的尺寸大小，迭代器的索引范圍是根據(jù)這個函數(shù)得來的。

import torch

class myDataset(torch.nn.data.Dataset):
 def __init__(self, dataSource)
  self.dataSource = dataSource

 def __getitem__(self， index):
  element = self.dataSource[index]
  return element
 def __len__(self):
  return len(self.dataSource)

train_data = myDataset(dataSource)

自定義數(shù)據(jù)集加載器

class torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False, sampler=None, batch_sampler=None, num_workers=0, collate_fn=<function default_collate>, pin_memory=False, drop_last=False, timeout=0, worker_init_fn=None): 數(shù)據(jù)加載器。組合了一個數(shù)據(jù)集和采樣器，并提供關(guān)于數(shù)據(jù)的迭代器。

dataset (Dataset) – 需要加載的數(shù)據(jù)集（可以是自定義或者自帶的數(shù)據(jù)集）。

batch_size – batch的大?。蛇x項，默認值為1）。

shuffle – 是否在每個epoch中shuffle整個數(shù)據(jù)集，默認值為False。

sampler – 定義從數(shù)據(jù)中抽取樣本的策略. 如果指定了, shuffle參數(shù)必須為False。

num_workers – 表示讀取樣本的線程數(shù)， 0表示只有主線程。

collate_fn – 合并一個樣本列表稱為一個batch。

pin_memory – 是否在返回數(shù)據(jù)之前將張量拷貝到CUDA。

drop_last (bool, optional) – 設(shè)置是否丟棄最后一個不完整的batch，默認為False。

timeout – 用來設(shè)置數(shù)據(jù)讀取的超時時間的，但超過這個時間還沒讀取到數(shù)據(jù)的話就會報錯。應(yīng)該為非負整數(shù)。

train_loader=torch.utils.data.DataLoader(dataset=train_data, batch_size=64, shuffle=True)

以上這篇pytorch 自定義數(shù)據(jù)集加載方法就是小編分享給大家的全部內(nèi)容了，希望能給大家一個參考，也希望大家多多支持億速云。

向AI問一下細節(jié)

推薦閱讀：

免責聲明：本站發(fā)布的內(nèi)容（圖片、視頻和文字）以原創(chuàng)、轉(zhuǎn)載和分享為主，文章觀點不代表本網(wǎng)站立場，如果涉及侵權(quán)請聯(lián)系站長郵箱：is@yisu.com進行舉報，并提供相關(guān)證據(jù)，一經(jīng)查實，將立刻刪除涉嫌侵權(quán)內(nèi)容。

上一篇新聞：
數(shù)據(jù)鏈路層詳解
下一篇新聞：
基于vue通用表單解決方案的思考與分析

猜你喜歡

AI
助
手

產(chǎn)品服務(wù)

地區(qū)劃分

專題活動

幫助支持

關(guān)于我們

售后咨詢

7*24小時在線電話：400-100-2938

7*24小時在線 QQ：800811969

關(guān)注億速云

億速云公眾號

手機網(wǎng)站二維碼

<menuitem id="fswrg"><dfn id="fswrg"></dfn></menuitem>