Breast Histopathology Image Segmentation Part 5

Guangzhou, China

Github

Model Evaluation
Serializing a Model
Training Re-run

Based on Breast Histopathology Images by Paul Mooney. Invasive Ductal Carcinoma (IDC) is the most common subtype of all breast cancers. To assign an aggressiveness grade to a whole mount sample, pathologists typically focus on the regions which contain the IDC. As a result, one of the common pre-processing steps for automatic aggressiveness grading is to delineate the exact regions of IDC inside of a whole mount slide. Can recurring breast cancer be spotted with AI tech? - BBC News

Citation: Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases
Dataset: 198,738 IDC(negative) image patches; 78,786 IDC(positive) image patches

Model Evaluation

Classification Report

Generate a Classification Report from your trained model:

Precision: True Positives / (True Positives + False Positives)
- How certain are you that a positive prediction is actually positive?
Recall: True Positives / (True Positives + False Negatives)
- How many of the actual positives can be predicted with our model?
F1 Score: Harmonic mean of precision and recall
- Reaches a maximum when precision and recall are equal

utils/modelEval.py

# Loading the best performing model
# Please specify the model name from the output folder which has the lowest val_loss

# ResNet50
path1 = config.OUTPUT_PATH + '/resnet50_weights-025-0.6333.hdf5'

# Custom
# path1 = config.OUTPUT_PATH + '/custom_weights-009-0.4244.hdf5'

fModel = load_model(path1)

# Predicting on the test data
print("Predicting on the test data")
# if totalTest is odd number add `+1` to predTest
print("totalTrain: " , totalTrain , ", totalVal: " , totalVal , ", totalTest: " , totalTest)
# totalTrain:  199818 , totalVal:  22201 , totalTest:  55505
predTest = fModel.predict(x=testGen, steps=(totalTest // config.BATCH_SIZE)+1)
predTest = np.argmax(predTest, axis=1)

# Printing the Classification Report
print(classification_report(testGen.classes, predTest, target_names=testGen.class_indices.keys()))

Run the script to generate the evaluation report:

ResNet50

              precision    recall  f1-score   support

           0       0.84      0.60      0.70     39775
           1       0.42      0.72      0.53     15730

    accuracy                           0.64     55505
   macro avg       0.63      0.66      0.62     55505
weighted avg       0.72      0.64      0.65     55505

Now this looks pretty terrible :)

Custom

              precision    recall  f1-score   support

           0       0.93      0.84      0.89     39775
           1       0.68      0.84      0.75     15730

    accuracy                           0.84     55505
   macro avg       0.81      0.84      0.82     55505
weighted avg       0.86      0.84      0.85     55505

Already a lot better!

Confusion Matrix

The Confusion Matrix can be used to visualize the performance of an classification model. Since we only have two classes - begin and malignant - the matrix will look like this:

True Positive: Malignant carcinoma correctly predicted
False Positive: Malignant carcinoma falsely predicted
True Negative: Benign carcinoma correctly predicted
True Negative: Benign carcinoma falsely predicted

Breast Histopathology Image Segmentation

utils/modelEval.py

# Computing the confusion matrix and and using the same to derive the 
# accuracy, sensitivity, and specificity
cm = confusion_matrix(testGen.classes, predTest)
total = sum(sum(cm))
acc = (cm[0, 0] + cm[1, 1]) / total
sensitivity = cm[0, 0] / (cm[0, 0] + cm[0, 1])
specificity = cm[1, 1] / (cm[1, 0] + cm[1, 1])

# Printing the confusion matrix, accuracy, sensitivity, and specificity
print(cm)
print("acc: {:.4f}".format(acc))
print("sensitivity: {:.4f}".format(sensitivity))
print("specificity: {:.4f}".format(specificity))

ResNet50

[[23954 15821]
 [ 4425 11305]]
acc: 0.6352
sensitivity: 0.6022
specificity: 0.7187

Custom

[[33597  6178]
 [ 2498 13232]]
acc: 0.8437
sensitivity: 0.8447
specificity: 0.8412

AUC-ROC

Area Under the Curve (AUC) of Receiver Operating Characteristic (ROC):

ROC: Evaluation metric for binary classification problems to separate signal from noise.
AUC: A measurement of the ability of a classifier to distinguish between classes based on the ROC. A higher value shows a better performance.

## load latest weights
path1 = config.OUTPUT_PATH + '/resnet50_weights.hdf5'

fModel = load_model(path1)

# Predicting on the test data
print("Predicting on the test data")
# if totalTest is odd number add `+1` to predTest
print("totalTrain: " , totalTrain , ", totalVal: " , totalVal , ", totalTest: " , totalTest)
# totalTrain:  199818 , totalVal:  22201 , totalTest:  55505
predTest = fModel.predict(x=testGen, steps=(totalTest // config.BATCH_SIZE)+1)
predTest = np.argmax(predTest, axis=1)

# Calculate roc auc
XGB_roc_value = roc_auc_score(testGen.classes, predTest)
print("XGboost roc_value: {0}" .format(XGB_roc_value))

# Plotting the graph
training_plot(MF, config.EPOCHS, config.PLOT_PATH)

Serializing a Model

When you are satisfied with the training you can save the model so it is ready for transfer:

# Serialize/Writing the model to disk
print("Serializing network...")
fModel.save(config.MODEL_PATH, save_format="h5")

Training Re-run

I changed the following variables:

utils/config.py

BATCH_SIZE = 24
INIT_LR = 1e-3
EPOCHS = 20

For the Custom Model I am still seeing small improvements in the later epochs - looking much better:

Epoch 18/20
8324/8325 [============================>.] - ETA: 0s - loss: 0.4561 - accuracy: 0.8652  
Epoch 18: val_loss improved from 0.31292 to 0.31132, saving model to ./output/custom_weights.hdf5
8325/8325 [==============================] - 247s 30ms/step - loss: 0.4561 - accuracy: 0.8652 - val_loss: 0.3113 - val_accuracy: 0.8663

The ResNet50 Model also improves compared to the earlier run. But it still get's stuck around the 10th epoch and even gets much worse after that - given the fluctuation I probably should have let it run a bit longer:

Epoch 9/20
8323/8325 [============================>.] - ETA: 0s - loss: 0.8274 - accuracy: 0.7163  
Epoch 9: val_loss improved from 0.60565 to 0.58112, saving model to ./output/resnet50_weights.hdf5
8325/8325 [==============================] - 249s 30ms/step - loss: 0.8274 - accuracy: 0.7163 - val_loss: 0.5811 - val_accuracy: 0.7097

Breast Histopathology Image Segmentation

I did another - much longer - run for the Custom Model and I am still seeing improvements up to the 54th epoch. Though the loss function still returns a slightly higher value as before:

Epoch 54/220
8324/8325 [============================>.] - ETA: 0s - loss: 0.4869 - accuracy: 0.8566  
Epoch 54: val_loss improved from 0.37685 to 0.36609, saving model to ./output/custom_weights.hdf5
8325/8325 [==============================] - 222s 27ms/step - loss: 0.4869 - accuracy: 0.8566 - val_loss: 0.3661 - val_accuracy: 0.8482

Breast Histopathology Image Segmentation

Model Evaluation​

Classification Report​

Confusion Matrix​

AUC-ROC​

Serializing a Model​

Training Re-run​

Model Evaluation

Classification Report

Confusion Matrix

AUC-ROC

Serializing a Model

Training Re-run