Tensorflow2 Crash Course - Part II

Mong Kok, Hongkong

Training

This set of Notebooks provides a complete set of code to be able to train and leverage your own custom object detection model using the Tensorflow Object Detection API.

This article is based on a Tutorial by @nicknochnack.

Training

Github Repository

Download Pre-trained Model

Begin training process by opening training_the_model.py.ipynb, this notebook will walk you through installing Tensorflow Object Detection, making detections, saving and exporting your model:

Precision: True Positiv / (True Positiv + False Positive)
Recall: True Positiv / (True Positiv + False Negative)

Detection Models: Tensorflow provides a collection of detection models pre-trained on the COCO 2017 dataset. These models can be useful for out-of-the-box inference if you are interested in categories already in those datasets.

The model table gives you both a Speed parameter as well as a COCO mAP that correlates with the accuracy of the model. The higher the speed the less accurate the detection will become.

Install TFOD

During this process the Notebook will install Tensorflow Object Detection. You should ideally receive a notification indicating that the API has installed successfully with the last line stating:

Tensorflow Object Detection Walkthrough

If not, resolve installation errors by referring to the Error Guide.md in this folder.

Prepare your Model

The Step Get your pretrained Model downloads the model you defined under PRETRAINED_MODEL_PATH and unzips it. The following steps provide a Label Map with the labels you assigned to your images as well as the location of all your training and testing images for the TF Records.

The model that was downloaded now has to be configured with the correct paths according to our project structure. This can be done in the following file:

Tensorflow/workspace/pre-trained-models/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/pipeline.config

The file is going to be copied to the trainings folder and all the missing pieces will be automatically filled out. Here we have the 5 instances that need to be defined:

fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED"
  num_steps: 50000
  startup_delay_steps: 0.0
  replicas_to_aggregate: 8
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
  fine_tune_checkpoint_type: "classification"
  fine_tune_checkpoint_version: V2
}
train_input_reader {
  label_map_path: "PATH_TO_BE_CONFIGURED"
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED"
  }
}
eval_config {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
}
eval_input_reader {
  label_map_path: "PATH_TO_BE_CONFIGURED"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED"
  }

As well as the number of classes the model has to handle. By default this is set to 90. In our case we are only looking for 4 different hand gestures:

num_classes: 4

Train the Model

You may choose to train the model from within the notebook. I have noticed however that training inside of a separate terminal you're able to display live loss metrics. To do so run the following command from the root dir of your app (and within the virtual environment):

python Tensorflow/models/research/object_detection/model_main_tf2.py --model_dir=Tensorflow/workspace/models/my_ssd_mobnet --pipeline_config_path=Tensorflow/workspace/models/my_ssd_mobnet/pipeline.config --num_train_steps=2000

This will run a 2000 steps trainings run for the initial training:

Tensorflow Object Detection Walkthrough

How to check the GPU load on Linux ? Mem load went up during the run. CPU load stayed low - I guess this means the calculations were run on my GPU ? There is a strang warning in the logs returning NUMA node zero - though apparently you can ignore it:

2022-01-01 19:28:12.478293: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-01-01 19:28:12.508530: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-01-01 19:28:13.763340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4691 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1

UPDATE run the nvidia-smi command and you will see the python process occupying the majority of your GPU resources:

Tensorflow Object Detection Walkthrough

The training will print out an intermediate result every 100 steps during which you should see a constant decrease of the Loss Parameter:

INFO:tensorflow:Step 2000 per-step time 0.119s
I0101 19:33:06.292452 140223989212992 model_lib_v2.py:705] Step 2000 per-step time 0.119s
INFO:tensorflow:{'Loss/classification_loss': 0.09995355,

All trainings data is stored in /opt/Python/TFODCourse/Tensorflow/workspace/models/my_ssd_mobnet!

Evaluate the Model

Again, you can use the Jupyter notebook a=or just copy the command into your terminal:

python Tensorflow/models/research/object_detection/model_main_tf2.py --model_dir=Tensorflow/workspace/models/my_ssd_mobnet --pipeline_config_path=Tensorflow/workspace/models/my_ssd_mobnet/pipeline.config --checkpoint_dir=Tensorflow/workspace/models/my_ssd_mobnet

This will give you the average Precision and Recall for your model - both should ideally be 1 and are already ok-ish with 0.706 and 0.713 in my case:

Accumulating evaluation results...
DONE (t=0.02s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.706
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.881
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.706
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.713
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.725
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.744
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.744

You can optionally evaluate your model inside of Tensorboard. Once the model has been trained and you have run the evaluation command. Open Tensorboard for the Evaluation Metrics with the following command:

tensorboard --logdir=Tensorflow/workspace/models/my_ssd_mobnet/eval

Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.7.0 at http://localhost:6006/ (Press CTRL+C to quit)

Tensorflow Object Detection Walkthrough

Tensorboard will be accessible through your browser and you will be able to see metrics including mAP - mean Average Precision, and Recall. Currently we only have one data point. But the metrics will be tracked over every trainings run.

The board also gives us access to result images:

Tensorflow Object Detection Walkthrough

Your Loss Metrics are available from:

tensorboard --logdir=Tensorflow/workspace/models/my_ssd_mobnet/train

Tensorflow Object Detection Walkthrough

Training​

Download Pre-trained Model​

Install TFOD​

Prepare your Model​

Train the Model​

Evaluate the Model​

Training

Download Pre-trained Model

Install TFOD

Prepare your Model

Train the Model

Evaluate the Model