Skip to main content

YOLOv7 Label Conversion

Guangzhou, China

In the previous step we cloned the YOLOv7 repository and run predictions using testing weights. I then trained YOLOv7 with a custom dataset that was pre-labeled the "YOLO Way".

Another annotations format that can be generated by LabelImg is the PASCAL VOC format. I now want to see how to transfer a dataset that was labeled that way into a YOLO training workflow. I am going to use the following Kaggle dataset:

And for the transfer code had help from Convert PASCAL VOC XML to YOLO for Object Detection by Ng Wai Foong.

Getting YOLOv7

Training Weights

I already went through all the steps to download and test-run YOLOv7. I ran into difficulties with my graphic card only having 6Gig of VRAM (Nvidia GTX 1060) which forced me to reduce the batch size to 1. Since this freed up a lot of memory - a size of 2 was too much, while 1 underutilized the card - I want to use the following weights:

wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6e_training.pt

YOLOv7 Data Conversion

The e6e increases the amounts of parameter from ~ 40Mio to ~ 150Mio... let's see what happens...

Configuration

Now we need to configure YOLO. First create a copy of the following file and call it something like:

cp yolov7/cfg/training/yolov7-e6e.yaml yolov7/cfg/training/yolov7-e6e-ppe.yaml

Here we only need to adjust the amount of classes we expect in our dataset - 3:

# parameters
nc: 3 # number of classes

Error: I am getting an IndexError: list index out of range when trying to use yolov7-e6e.yaml file. When I switch to yolov7-custom.yaml the training works. Even though I am using the e6e weights:

Traceback (most recent call last):
File "yolov7/train.py", line 616, in <module>
train(hyp, opt, device, tb_writer)
File "yolov7/train.py", line 363, in train
loss, loss_items = compute_loss_ota(pred, targets.to(device), imgs) # loss scaled by batch_size
File "yolov7/utils/loss.py", line 585, in __call__
bs, as_, gjs, gis, targets, anchors = self.build_targets(p, targets, imgs)
File "yolov7/utils/loss.py", line 677, in build_targets
b, a, gj, gi = indices[i]
IndexError: list index out of range

UPDATE: Apparently you need to use train_aux.py instead of train.py to work with e6e weights. I will test this next.

Next create the directories yolov7/custom_data/ppe and add a data file ppe-data.yml:

train: ./custom_data/ppe/train
val: ./custom_data/ppe/validation
test: ./custom_data/ppe/test

# Classes
nc: 3 # number of classes
names: ['with_mask', 'without_mask', 'mask_worn_incorrect']

The original class was mask_weared_incorrect - I searched and replaced it to mask_worn_incorrect

Preparing the Dataset

This dataset contains 853 images belonging to the 3 classes, as well as their bounding boxes in the PASCAL VOC format. The classes are:

  • With mask
  • Without mask
  • Mask worn incorrectly

The annotations are in an XML format:

<annotation>
<folder>images</folder>
<filename>maksssksksss852.png</filename>
<size>
<width>267</width>
<height>400</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>with_mask</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<occluded>0</occluded>
<difficult>0</difficult>
<bndbox>
<xmin>139</xmin>
<ymin>94</ymin>
<xmax>198</xmax>
<ymax>147</ymax>
</bndbox>
</object>
</annotation>

The downloaded dataset is only split into images and annotations:

yolov7/custom_data
└── ppe
├── ppe-data.yml
├── images
└── annotations

We now need to convert the annotation-data format from PASCAL VOC to YOLO and split the data into train, val, test-folders:

yolov7/convert_xml_to_yolo.py

import pandas as pd
import numpy as np
import seaborn as sns
import os
import shutil
import xml.etree.ElementTree as ET
import glob

import json
# Function for conversion XML to YOLO
# based on https://towardsdatascience.com/convert-pascal-voc-xml-to-yolo-for-object-detection-f969811ccba5
def xml_to_yolo_bbox(bbox, w, h):
# xmin, ymin, xmax, ymax
x_center = ((bbox[2] + bbox[0]) / 2) / w
y_center = ((bbox[3] + bbox[1]) / 2) / h
width = (bbox[2] - bbox[0]) / w
height = (bbox[3] - bbox[1]) / h
return [x_center, y_center, width, height]

# create folders
def create_folder(path):
if not os.path.exists(path):
os.makedirs(path)
print("INFO :: Path '%s' created" %path)

create_folder('custom_data/ppe/train/images')
create_folder('custom_data/ppe/train/labels')
create_folder('custom_data/ppe/validation/images')
create_folder('custom_data/ppe/validation/labels')
create_folder('custom_data/ppe/test/images')
create_folder('custom_data/ppe/test/labels')

# get all source files
img_src_folder = 'custom_data/ppe/images'
label_src_folder = 'custom_data/ppe/annotations'

_, _, files = next(os.walk(img_src_folder))
pos = 0
for f in files:
source_img = os.path.join(img_src_folder, f)
if pos < 700:
dest_folder = 'custom_data/ppe/train'
elif (pos >= 700 and pos < 800):
dest_folder = 'custom_data/ppe/validation'
else:
dest_folder = 'custom_data/ppe/test'
destination_img = os.path.join(dest_folder,'images', f)
shutil.copy(source_img, destination_img)

# check for corresponding label
label_file_basename = os.path.splitext(f)[0]
label_source_file = f"{label_file_basename}.xml"
label_dest_file = f"{label_file_basename}.txt"

label_source_path = os.path.join(label_src_folder, label_source_file)
label_dest_path = os.path.join(dest_folder, 'labels', label_dest_file)
# if file exists, copy it to target folder
if os.path.exists(label_source_path):
# parse the content of the xml file
tree = ET.parse(label_source_path)
root = tree.getroot()
width = int(root.find("size").find("width").text)
height = int(root.find("size").find("height").text)
classes = ['with_mask', 'without_mask', 'mask_worn_incorrect']
result = []
for obj in root.findall('object'):
label = obj.find("name").text
# check for new classes and append to list
index = classes.index(label)
pil_bbox = [int(x.text) for x in obj.find("bndbox")]
yolo_bbox = xml_to_yolo_bbox(pil_bbox, width, height)
# convert data to string
bbox_string = " ".join([str(x) for x in yolo_bbox])
result.append(f"{index} {bbox_string}")
if result:
# generate a YOLO format text file for each xml file
with open(label_dest_path, "w", encoding="utf-8") as f:
f.write("\n".join(result))

pos += 1

Make sure that all the paths in here match your dir structure - also replace the / with \\ in path variables if your are on Windows. And run the script:

python convert_xml_to_yolo.py
INFO :: Path 'custom_data/ppe/train/images' created
INFO :: Path 'custom_data/ppe/train/labels' created
INFO :: Path 'custom_data/ppe/validation/images' created
INFO :: Path 'custom_data/ppe/validation/labels' created
INFO :: Path 'custom_data/ppe/test/images' created
INFO :: Path 'custom_data/ppe/test/labels' created

Check the custom_data folder - you should now have a split all the data into the three training, testing and validating directories:

yolov7/custom_data
└── ppe
├── ppe-data.yml
├── test
│   ├── images
│   │   └── 53 items
│   └── labels
│   └── 53 items
├── train
│   ├── images
│   │   └── 700 items
│   └── labels
│   │   └── 700 items
└── validation
├── images
│   └── 100 items
└── labels
└── 100 items

And all the labels are now in the expected YOLO format:

0 0.2425 0.36099585062240663 0.115 0.2074688796680498

Model Fitting

The data is now compatible with YOLOv7 and can be used to train a model to recognize personal protection equipment for us:

python train.py --epochs 100 --weights weights/yolov7-e6e_training.pt --data custom_data/ppe/ppe-data.yml --workers 4 --batch-size 1 --img 416 --cfg cfg/training/yolov7_custom.yaml --name yolov7-ppe

The run took around 4hrs with the following - disappointing - results:

Epoch   gpu_mem       box       obj       cls     total
99/99 2.67G 0.02546 0.01013 0.003265 0.03886

Class Images Labels P R
all 100 442 0.399 0.372
with_mask 100 368 0.765 0.802
without_mask 100 60 0.431 0.315
mask_worn_incorrect 100 14 0 0
100 epochs completed in 3.783 hours.

Optimizer stripped from runs/train/yolov7-ppe4/weights/last.pt, 74.8MB
Optimizer stripped from runs/train/yolov7-ppe4/weights/best.pt, 74.8MB

Testing

python test.py --weights runs/train/yolov7-ppe4/weights/best.pt \
--task test \
--data custom_data/ppe/ppe-data.yml
Class                     Images      Labels    P           R   
all 53 223 0.357 0.476
with_mask 53 191 0.676 0.749
without_mask 53 25 0.395 0.679
mask_worn_incorrect 53 7 0 0
Speed: 56.5/1.1/57.6 ms inference/NMS/total per 640x640 image at batch-size 32

Just terrible...

YOLOv7 Data Conversion

Correction

python train_aux.py --device 0 --epochs 20 --weights weights/yolov7-e6e_training.pt --data custom_data/ppe/ppe-data.yml --workers 4 --batch-size 1 --img 416 --cfg cfg/training/yolov7-e6e-ppe.yaml --name yolov7-aux-ppe

Here I am getting the following error message:

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

Hmm there are several proposed solutions: Github Issue, Stack Overflow. But none of the appear to be working for me right now. So let's drop the e6e and try the regular YOLOv4 model:

python train.py --epochs 20 --weights weights/yolov7_training.pt --data custom_data/ppe/ppe-data.yml --workers 4 --batch-size 1 --img 640 640 --cfg cfg/training/yolov7_custom.yaml --name yolov7-regular-ppe

And this looks more promising - already after 20 epochs:

Class                      Images      Labels    P           R
all 100 442 0.942 0.127
with_mask 100 368 0.827 0.38
without_mask 100 60 1 0
mask_worn_incorrect 100 14 1 0
20 epochs completed in 1.116 hours.

Optimizer stripped from runs/train/yolov7-regular-ppe/weights/last.pt, 74.8MB
Optimizer stripped from runs/train/yolov7-regular-ppe/weights/best.pt, 74.8MB

Testing

python test.py --weights runs/train/yolov7-regular-ppe/weights/best.pt \
--task test \
--data custom_data/ppe/ppe-data.yml

YOLOv7 Data Conversion

Class                     Images     Labels    P        R
all 53 223 0.935 0.166
with_mask 53 191 0.805 0.497
without_mask 53 25 1 0
mask_worn_incorrect 53 7 1 0

Prediction

The prediction works "best" for frontal shots of large groups of people. But there are a lot of false positives. The model fails completely when dealing with close-ups:

python detect.py --weights runs/train/yolov7-regular-ppe/weights/best.pt \
--conf 0.1 \
--img-size 640 \
--source custom_data/ppe/test/images/maksssksksss836.png

YOLOv7 Data Conversion

But it is far from great yet - the model has about a 50:50 chance to detect a mask. And zero chance to detect no mask or a wrongly worn mask. The next step is to extend the the training and see if the R-value improves over time.

Extended Run

Night-shift run... yolov7x_training.pt

I decided to give YOLOv7x a try. It is much more complex compared to YOLOv7 (36.9M vs 71.3M parameter). And - to my knowledge - it does not require you to use train_aux.py that was causing the issues earlier. So all I needed was to download the matching training weights and create a copy of cfg/training/yolov7x.yaml with the correct number of classes:

python train.py --device 0 --epochs 200 --weights weights/yolov7x_training.pt --data custom_data/ppe/ppe-data.yml --workers 4 --batch-size 1 --img 640 640 --cfg cfg/training/yolov7x-ppe.yaml --name yolov7x-ppe

The run over 200 Epochs took my machine 14.5hrs and ended with the following results:

199/199     3.34G   0.01889  0.008947  0.001511   0.02935

Class Images Labels P R
all 100 442 0.135 0.744
with_mask 100 368 0.104 0.902
without_mask 100 60 0.211 0.617
mask_worn_incorrect 100 14 0.0908 0.714
200 epochs completed in 14.672 hours.

Optimizer stripped from runs/train/yolov7x-ppe/weights/last.pt, 142.1MB
Optimizer stripped from runs/train/yolov7x-ppe/weights/best.pt, 142.1MB

Testing

python test.py --weights runs/train/yolov7x-ppe/weights/best.pt \
--task test \
--data custom_data/ppe/ppe-data.yml

YOLOv7 Data Conversion

Class                  Images     Labels    P       R    
all 53 223 0.978 0.712
with_mask 53 191 0.935 0.77
without_mask 53 25 1 0.8
mask_worn_incorrect 53 7 1 0.566
Speed: 81.4/0.8/82.2 ms inference/NMS/total per 640x640 image at batch-size 32

The confusion matrix correlates nicely with the test predictions. 87% of masks were identified correctly. For faces without masks we get 85%. For incorrectly worn masks - a much more complicated case - we are already at 67%. But the model is still seeing a lot of masks in the background that do not exist. But overall - a successful training 👍

Predictions

For a comparison I want to run the same predictions I used with the previous model - maksssksksss836.png and maksssksksss99.png:

python detect.py --weights runs/train/yolov7x-ppe/weights/best.pt \
--conf 0.5 \
--img-size 640 \
--source custom_data/ppe/test/images/maksssksksss836.png

For the close-up image I decreased the confidence barrier to 0.1 to see if the person on the right would show up with the mask in a side-profile. And he does - it is really impressive:

YOLOv7 Data Conversion