您好,登錄后才能下訂單哦!
在本教程中,您將學(xué)習(xí)如何使用經(jīng)典的one-stage目標(biāo)檢測網(wǎng)絡(luò)Yolo v3來實現(xiàn)口罩檢測,關(guān)于Yolo v3的資料可以閱讀paper。 本教程所有工程文件可在滴滴云S3存儲服務(wù)下載。
滴滴云Notebook筆記本服務(wù)集成了CUDA、CuDNN、Python、TensorFlow、Pytorch、MxNet、Keras等深度學(xué)習(xí)框架,無需用戶自己安裝。
注冊滴滴云并實名認(rèn)證后可購買Notebook服務(wù)。
進(jìn)入控制臺Notebook頁面,單擊 創(chuàng)建Notebook實例按鈕。
選擇基礎(chǔ)配置:
進(jìn)入我的Notebook頁面,在 操作列單擊 打開Notebook。
進(jìn)入Notebook詳情頁面,單擊 打開Notebook。
import cv2import mathimport matplotlib.pyplot as pltimport numpy as npimport osimport randomimport timeimport torchimport torchvisionimport torch.nn as nnimport torch.nn.init as initimport torch.optim as optimimport xml.etree.ElementTree as ET from torch.utils.data import Dataset, DataLoader
下載口罩檢測數(shù)據(jù)集并上傳到Notebook服務(wù)器,這里我們以AIZOO開源數(shù)據(jù)集為例,下載地址: https://pan.baidu.com/s/1nsQf_Py5YyKm87-8HiyJeQ ,提取碼:eyfz,大文件上傳可能需要用到滴滴云S3存儲服務(wù)。為了便于用戶下載,我們提前上傳到公共數(shù)據(jù)集對象存儲S3,執(zhí)行以下shell命令即可,如自行上傳可以跳過
!wget https://dataset-public.s3.didiyunapi.com/detection/人臉口罩檢測/part1.tgz !wget https://dataset-public.s3.didiyunapi.com/detection/人臉口罩檢測/part2.tgz !wget https://dataset-public.s3.didiyunapi.com/detection/人臉口罩檢測/val.tgz !tar -zxf part1.tgz !tar -zxf part2.tgz !tar -zxf val.tgz --2020-03-05 14:59:25-- https://dataset-public.s3.didiyunapi.com/detection/%E4%BA%BA%E8%84%B8%E5%8F%A3%E7%BD%A9%E6%A3%80%E6%B5%8B/part1.tgz Resolving dataset-public.s3.didiyunapi.com (dataset-public.s3.didiyunapi.com)... 125.94.54.9 Connecting to dataset-public.s3.didiyunapi.com (dataset-public.s3.didiyunapi.com)|125.94.54.9|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 270682151 (258M) [application/gzip] Saving to: ‘part1.tgz’ 100%[======================================>] 270,682,151 6.15MB/s in 42s 2020-03-05 15:00:06 (6.18 MB/s) - ‘part1.tgz’ saved [270682151/270682151] --2020-03-05 15:00:07-- https://dataset-public.s3.didiyunapi.com/detection/%E4%BA%BA%E8%84%B8%E5%8F%A3%E7%BD%A9%E6%A3%80%E6%B5%8B/part2.tgz Resolving dataset-public.s3.didiyunapi.com (dataset-public.s3.didiyunapi.com)... 125.94.54.9 Connecting to dataset-public.s3.didiyunapi.com (dataset-public.s3.didiyunapi.com)|125.94.54.9|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 337432016 (322M) [application/gzip] Saving to: ‘part2.tgz’ 100%[======================================>] 337,432,016 6.41MB/s in 53s 2020-03-05 15:00:59 (6.10 MB/s) - ‘part2.tgz’ saved [337432016/337432016] --2020-03-05 15:01:00-- https://dataset-public.s3.didiyunapi.com/detection/%E4%BA%BA%E8%84%B8%E5%8F%A3%E7%BD%A9%E6%A3%80%E6%B5%8B/val.tgz Resolving dataset-public.s3.didiyunapi.com (dataset-public.s3.didiyunapi.com)... 125.94.54.9 Connecting to dataset-public.s3.didiyunapi.com (dataset-public.s3.didiyunapi.com)|125.94.54.9|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 184383116 (176M) [application/gzip] Saving to: ‘val.tgz’ 100%[======================================>] 184,383,116 6.25MB/s in 29s 2020-03-05 15:01:28 (6.12 MB/s) - ‘val.tgz’ saved [184383116/184383116]
加載數(shù)據(jù),這里我們需要編寫自定義數(shù)據(jù)集的Dataset類型
VOC_CLASSES = ('face', 'face_mask')class AnnotationTransform(object): def __init__(self, class_to_ind=None, keep_difficult=True): self.class_to_ind = class_to_ind or dict( zip(VOC_CLASSES, range(len(VOC_CLASSES)))) self.keep_difficult = keep_difficult def __call__(self, target): res = np.empty((0,5)) for obj in target.iter('object'): difficult = int(obj.find('difficult').text) == 1 if not self.keep_difficult and difficult: continue name = obj.find('name').text.lower().strip() bbox = obj.find('bndbox') pts = ['xmin', 'ymin', 'xmax', 'ymax'] bndbox = [] for i, pt in enumerate(pts): cur_pt = int(bbox.find(pt).text) - 1 bndbox.append(cur_pt) label_idx = self.class_to_ind[name] bndbox.append(label_idx) res = np.vstack((res, bndbox)) # [xmin, ymin, xmax, ymax, label_ind] return res # [[xmin, ymin, xmax, ymax, label_ind], ... ]def preproc_for_test(image, input_size, mean, std): interp_methods = [cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_NEAREST, cv2.INTER_LANCZOS4] interp_method = interp_methods[random.randrange(5)] image = cv2.resize(image, input_size, interpolation=interp_method) image = image.astype(np.float32) image = image[:, :, ::-1] image /= 255. if mean is not None: image -= mean if std is not None: image /= std return image.transpose(2, 0, 1)class TrainTransform(object): def __init__(self, rgb_means=None, std=None, max_labels=50): self.means = rgb_means self.std = std self.max_labels = max_labels def __call__(self, image, targets, img_size): boxes = targets[:, :4].copy() # Nx4 labels = targets[:, 4].copy() if len(boxes) == 0: targets = np.zeros((self.max_labels, 5), dtype=np.float32) image = preproc_for_test(image, img_size, self.means, self.std) image = np.ascontiguousarray(image, dtype=np.float32) return torch.from_numpy(image), torch.from_numpy(targets) height, width, _ = image.shape boxes_o = targets[:, :4] labels = targets[:, 4] b_x_o = (boxes_o[:, 2] + boxes_o[:, 0]) * .5 b_y_o = (boxes_o[:, 3] + boxes_o[:, 1]) * .5 b_w_o = (boxes_o[:, 2] - boxes_o[:, 0]) * 1. b_h_o = (boxes_o[:, 3] - boxes_o[:, 1]) * 1. boxes_o[:, 0] = b_x_o boxes_o[:, 1] = b_y_o boxes_o[:, 2] = b_w_o boxes_o[:, 3] = b_h_o # resize interp_methods = [cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_NEAREST, cv2.INTER_LANCZOS4] interp_method = interp_methods[random.randrange(5)] image_t = cv2.resize(image, img_size, interpolation=interp_method) boxes = boxes_o boxes[:, 0::2] /= width boxes[:, 1::2] /= height boxes[:, 0::2] *= img_size[0] boxes[:, 1::2] *= img_size[1] image_t = preproc_for_test(image_t, img_size, self.means, self.std) labels = np.expand_dims(labels, 1) targets_t = np.hstack((labels, boxes)) padded_labels = np.zeros((self.max_labels, 5)) padded_labels[range(len(targets_t))[:self.max_labels]] = targets_t[:self.max_labels] padded_labels = np.ascontiguousarray(padded_labels, dtype=np.float32) image_t = np.ascontiguousarray(image_t, dtype=np.float32) return torch.from_numpy(image_t), torch.from_numpy(padded_labels) # 數(shù)據(jù)集類型定義class VOCDetection(Dataset): def __init__(self, root, preproc=None, target_transform=AnnotationTransform(), img_size=(416, 416), split='train'): super().__init__() self.root = root self.preproc = preproc self.target_transform = target_transform self.img_size = img_size self._annopath = os.path.join('%s', 'Annotations', '%s.xml') self._imgpath = os.path.join('%s', 'JPEGImages', '%s.jpg') self._classes = VOC_CLASSES self._year = '2012' # options: '2007', which is related to eval protocol self.item_container = set() if split == 'train': for folder in ['part1', 'part2']: for item in os.listdir(os.path.join(self.root, folder)): self.item_container.add(os.path.join(self.root, folder, item[:-4])) else: for folder in ['val']: for item in os.listdir(os.path.join(self.root, folder)): self.item_container.add(os.path.join(self.root, folder, item[:-4])) self.item_container = list(self.item_container) def __getitem__(self, index): item = self.item_container[index] target = ET.parse(item+'.xml').getroot() img = cv2.imread(item+'.jpg') # img = Image.open(self._imgpath % img_id).convert('RGB') height, width, _ = img.shape if self.target_transform is not None: target = self.target_transform(target) if self.preproc is not None: img, target = self.preproc(img, target, self.img_size) img_info = (width, height) return img, target, img_info, item def __len__(self): return len(self.item_container) dataset = VOCDetection(root='./', preproc=TrainTransform(),split='train')
定義模型,對模型參數(shù)進(jìn)行初始化操作,這里需要導(dǎo)入自定義庫yolo,可在滴滴云S3存儲服務(wù)下載。
from yolo import YOLOv3 model = YOLOv3(num_classes = len(VOC_CLASSES))def init_yolo(M): for m in M.modules(): if isinstance(m, nn.Conv2d): init.kaiming_normal_(m.weight, a=0.1, mode='fan_in') if m.bias is not None: init.zeros_(m.bias) elif isinstance(m, nn.BatchNorm2d): init.ones_(m.weight) init.zeros_(m.bias) elif isinstance(m, nn.Linear): init.normal_(m.weight, 0, 0.01) init.zeros_(m.bias) m.state_dict()[key][...] = 0model.apply(init_yolo)model.train()torch.backends.cudnn.benchmark = Truedevice = torch.device("cuda")model = model.to(device)
現(xiàn)在我們使用Adam算法來訓(xùn)練模型
# 在訓(xùn)練之前我們先定義一些超參數(shù)batch_size = 8 # 每一批次訓(xùn)練大小,不宜太小,大小受GPU顯存限制base_lr = 0.0001 # 基準(zhǔn)學(xué)習(xí)率warmup_epochs = 10 # 學(xué)習(xí)率逐漸增加到base_lr的epochepochs = 70 # 總共訓(xùn)練epoch數(shù)save_interval = 10 # 保存模型的epoch間隔steps = [50, 60] # 學(xué)習(xí)率減少的epochdataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)optimizer = optim.Adam(model.parameters(), lr=base_lr, weight_decay=0.0005)epoch_size = len(dataset) // (batch_size*1)epoch = 1def set_lr(tmp_lr): for param_group in optimizer.param_groups: param_group['lr'] = tmp_lrwhile epoch < epochs+1: print('\n[Epoch {} started]'.format(epoch)) for iter_i, (imgs, targets, _, _) in enumerate(dataloader): start = time.time() if epoch % save_interval == 0: torch.save(model.state_dict(), 'yolov3_mask_detection_{}.pth'.format(epoch)) # 更新學(xué)習(xí)率 if epoch < warmup_epochs: tmp_lr = base_lr * pow((iter_i+epoch*epoch_size)*1. / (warmup_epochs*epoch_size), 1) set_lr(tmp_lr) elif epoch == warmup_epochs: tmp_lr = base_lr set_lr(tmp_lr) elif epoch in steps and iter_i == 0: tmp_lr = tmp_lr * 0.1 set_lr(tmp_lr) optimizer.zero_grad() imgs = imgs.to(device).to(torch.float32) targets = targets.to(device).to(torch.float32) loss_dict = model(imgs, targets, epoch) loss = sum(loss for loss in loss_dict['losses']) loss.backward() optimizer.step() end = time.time() if iter_i % 1 == 0: # 打印訓(xùn)練過程信息 print('\r[Epoch %d/%d][Iter %d/%d][LR %.6f]' '[Loss: l1 %.2f, conf %.6f, cls %.6f][Time: %.2f s]......' % (epoch, epochs, iter_i+1, epoch_size, tmp_lr, sum(l1_loss for l1_loss in loss_dict['l1_losses']).item(), sum(conf_loss for conf_loss in loss_dict['conf_losses']).item(), sum(cls_loss for cls_loss in loss_dict['cls_losses']).item(), end-start), end='') epoch += 1torch.save(model.state_dict(), 'yolov3_mask_detection_final.pth'.format(epoch))
[Epoch 1 started] [Epoch 1/70][Iter 765/765][LR 0.000020][Loss: l1 4.36, conf 1747.822632, cls 1.573278][Time: 0.61 s]........ [Epoch 2 started] [Epoch 2/70][Iter 765/765][LR 0.000030][Loss: l1 5.01, conf 781.659180, cls 1.723707][Time: 0.63 s]......... [Epoch 3 started] [Epoch 3/70][Iter 765/765][LR 0.000040][Loss: l1 18.94, conf 331.378754, cls 5.523039][Time: 0.69 s]....... [Epoch 4 started] [Epoch 4/70][Iter 78/765][LR 0.000041][Loss: l1 6.72, conf 293.010193, cls 1.820950][Time: 0.66 s]....... KeyboardInterrupt:
得到訓(xùn)練的模型之后可以開始測試
class ValTransform(object): def __init__(self, rgb_means=None, std=None, swap=(2, 0, 1)): self.means = rgb_means self.swap = swap self.std = std def __call__(self, img, res, input_size): interp_methods = [cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_NEAREST, cv2.INTER_LANCZOS4] interp_method = interp_methods[0] img = cv2.resize(np.array(img), input_size, interpolation=interp_method).astype(np.float32) img = img[:, :, ::-1] img /= 255. if self.means is not None: img -= self.means if self.std is not None: img /= self.std img = img.transpose(self.swap) img = np.ascontiguousarray(img, dtype=np.float32) return torch.from_numpy(img), torch.zeros(1, 5) transform = ValTransform()im = cv2.imread("val/test_00000760.jpg") # 輸入的圖片ori_im = im.copy()height, width, _ = im.shape test_size = (416, 416)im_input, _ = transform(im, None, test_size)im_input = im_input.to(device).type(torch.float32).unsqueeze(0)model.load_state_dict(torch.load('yolov3_mask_detection_final.pth')) # 加載訓(xùn)練權(quán)重device = torch.device("cuda")model = model.to(device)model.eval()outputs = model(im_input)
對模型輸出做后處理,除去置信度較低的bbox,并利用非極大抑制(NMS)去除同類型IoU較大的bbox
def postprocess(prediction, num_classes=2, conf_thre=0.3, nms_thre=0.45): box_corner = prediction.new(prediction.shape) box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2 box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2 box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2 box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2 prediction[:, :, :4] = box_corner[:, :, :4] output = [None for _ in range(len(prediction))] for i, image_pred in enumerate(prediction): # If none are remaining => process next image if not image_pred.size(0): continue # Get score and class with highest confidence class_conf, class_pred = torch.max( image_pred[:, 5:5 + num_classes], 1, keepdim=True) conf_mask = (image_pred[:, 4] * class_conf.squeeze() >= conf_thre).squeeze() # Detections ordered as (x1, y1, x2, y2, obj_conf, class_conf, class_pred) detections = torch.cat( (image_pred[:, :5], class_conf, class_pred.float()), 1) detections = detections[conf_mask] if not detections.size(0): continue # Iterate through all predicted classes unique_labels = detections[:, -1].unique() for c in unique_labels: # Get the detections with the particular class detections_class = detections[detections[:, -1] == c] nms_out_index = torchvision.ops.nms( detections_class[:, :4], detections_class[:, 4]*detections_class[:, 5], nms_thre) detections_class = detections_class[nms_out_index] if output[i] is None: output[i] = detections_class else: output[i] = torch.cat((output[i], detections_class)) return output outputs = postprocess(outputs, 2, 0.01, 0.35)outputs = outputs[0].cpu().data bboxes = outputs[:, 0:4]bboxes[:, 0::2] *= width / test_size[0]bboxes[:, 1::2] *= height / test_size[1]bboxes[:, 2] = bboxes[:, 2] - bboxes[:, 0]bboxes[:, 3] = bboxes[:, 3] - bboxes[:, 1]cls = outputs[:, 6]scores = outputs[:, 4] * outputs[:, 5]
最后我們將處理得到的結(jié)果可視化出來
def vis(img, boxes, scores, cls_ids, conf=0.5, class_names=None, color=None): colors = torch.FloatTensor([[1,0,1],[0,0,1],[0,1,1],[0,1,0],[1,1,0],[1,0,0]]); def get_color(c, x, max_val): ratio = float(x)/max_val * 5 i = int(math.floor(ratio)) j = int(math.ceil(ratio)) ratio = ratio - i r = (1-ratio) * colors[i][c] + ratio*colors[j][c] return int(r*255) width = img.shape[1] height = img.shape[0] for i in range(len(boxes)): box = boxes[i] cls_conf = scores[i] if cls_conf < conf: continue x1 = int(box[0]) y1 = int(box[1]) x2 = int(box[0]+box[2]) y2 = int(box[1]+box[3]) if color: rgb = color else: rgb = (255, 0, 0) if class_names is not None: cls_conf = scores[i] cls_id = int(cls_ids[i]) class_name = class_names[cls_id] classes = len(class_names) offset = cls_id * 123456 % classes red = get_color(2, offset, classes) green = get_color(1, offset, classes) blue = get_color(0, offset, classes) if color is None: rgb = (red, green, blue) img = cv2.putText(img, '%s: %.2f'%(class_name,cls_conf), (x1,y1-5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, rgb, 2) img = cv2.rectangle(img, (x1,y1), (x2,y2), rgb, 1) return img pred_im = vis(ori_im, bboxes.numpy(), scores.numpy(), cls.numpy(), conf=0.3, class_names=VOC_CLASSES)plt.rcParams['figure.figsize'] = (20, 12)plt.imshow(pred_im[:,:,::-1])
<matplotlib.image.AxesImage at 0x7f0c4cfc7950> 0015e6a27a5a1671b03db56e2718f19
需要說明的是本教程只是用于入門學(xué)習(xí),輸入圖像尺寸選擇416x416,沒有加入復(fù)雜的數(shù)據(jù)增廣算法、網(wǎng)絡(luò)訓(xùn)練優(yōu)化tricks,這些因素都導(dǎo)致模型存在人群密集情況漏檢以及小人頭漏檢問題。如果希望獲得更好的效果,可以使用人臉檢測模型,混入這個口罩?jǐn)?shù)據(jù)集,這樣能夠利用更多的人臉數(shù)據(jù)。最后對檢測出的人臉圖像做2分類即可。
作者:朱中濤【滴滴出行產(chǎn)品專家】
為研發(fā)提效,全是技術(shù)干貨的滴滴云技術(shù)沙龍報名中!
馬上關(guān)注滴滴云公眾號:
回復(fù)「上課」獲取免費(fèi)報名資格
回復(fù)「服務(wù)器」免費(fèi)獲得云服務(wù)器入門1個月體驗
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。