Skip to main content

Breast Histopathology Image Segmentation Part 5

Guangzhou, China

Github

Based on Breast Histopathology Images by Paul Mooney. Invasive Ductal Carcinoma (IDC) is the most common subtype of all breast cancers. To assign an aggressiveness grade to a whole mount sample, pathologists typically focus on the regions which contain the IDC. As a result, one of the common pre-processing steps for automatic aggressiveness grading is to delineate the exact regions of IDC inside of a whole mount slide. Can recurring breast cancer be spotted with AI tech? - BBC News

Model Evaluation

Classification Report

Generate a Classification Report from your trained model:

  • Precision: True Positives / (True Positives + False Positives)
    • How certain are you that a positive prediction is actually positive?
  • Recall: True Positives / (True Positives + False Negatives)
    • How many of the actual positives can be predicted with our model?
  • F1 Score: Harmonic mean of precision and recall
    • Reaches a maximum when precision and recall are equal

utils/modelEval.py

# Loading the best performing model
# Please specify the model name from the output folder which has the lowest val_loss

# ResNet50
path1 = config.OUTPUT_PATH + '/resnet50_weights-025-0.6333.hdf5'

# Custom
# path1 = config.OUTPUT_PATH + '/custom_weights-009-0.4244.hdf5'

fModel = load_model(path1)

# Predicting on the test data
print("Predicting on the test data")
# if totalTest is odd number add `+1` to predTest
print("totalTrain: " , totalTrain , ", totalVal: " , totalVal , ", totalTest: " , totalTest)
# totalTrain: 199818 , totalVal: 22201 , totalTest: 55505
predTest = fModel.predict(x=testGen, steps=(totalTest // config.BATCH_SIZE)+1)
predTest = np.argmax(predTest, axis=1)

# Printing the Classification Report
print(classification_report(testGen.classes, predTest, target_names=testGen.class_indices.keys()))

Run the script to generate the evaluation report:

ResNet50

              precision    recall  f1-score   support

0 0.84 0.60 0.70 39775
1 0.42 0.72 0.53 15730

accuracy 0.64 55505
macro avg 0.63 0.66 0.62 55505
weighted avg 0.72 0.64 0.65 55505

Now this looks pretty terrible :)

Custom

              precision    recall  f1-score   support

0 0.93 0.84 0.89 39775
1 0.68 0.84 0.75 15730

accuracy 0.84 55505
macro avg 0.81 0.84 0.82 55505
weighted avg 0.86 0.84 0.85 55505

Already a lot better!

Confusion Matrix

The Confusion Matrix can be used to visualize the performance of an classification model. Since we only have two classes - begin and malignant - the matrix will look like this:

  • True Positive: Malignant carcinoma correctly predicted
  • False Positive: Malignant carcinoma falsely predicted
  • True Negative: Benign carcinoma correctly predicted
  • True Negative: Benign carcinoma falsely predicted

Breast Histopathology Image Segmentation

utils/modelEval.py

# Computing the confusion matrix and and using the same to derive the 
# accuracy, sensitivity, and specificity
cm = confusion_matrix(testGen.classes, predTest)
total = sum(sum(cm))
acc = (cm[0, 0] + cm[1, 1]) / total
sensitivity = cm[0, 0] / (cm[0, 0] + cm[0, 1])
specificity = cm[1, 1] / (cm[1, 0] + cm[1, 1])

# Printing the confusion matrix, accuracy, sensitivity, and specificity
print(cm)
print("acc: {:.4f}".format(acc))
print("sensitivity: {:.4f}".format(sensitivity))
print("specificity: {:.4f}".format(specificity))

ResNet50

[[23954 15821]
[ 4425 11305]]
acc: 0.6352
sensitivity: 0.6022
specificity: 0.7187

Custom

[[33597  6178]
[ 2498 13232]]
acc: 0.8437
sensitivity: 0.8447
specificity: 0.8412

AUC-ROC

Area Under the Curve (AUC) of Receiver Operating Characteristic (ROC):

  • ROC: Evaluation metric for binary classification problems to separate signal from noise.
  • AUC: A measurement of the ability of a classifier to distinguish between classes based on the ROC. A higher value shows a better performance.
## load latest weights
path1 = config.OUTPUT_PATH + '/resnet50_weights.hdf5'

fModel = load_model(path1)

# Predicting on the test data
print("Predicting on the test data")
# if totalTest is odd number add `+1` to predTest
print("totalTrain: " , totalTrain , ", totalVal: " , totalVal , ", totalTest: " , totalTest)
# totalTrain: 199818 , totalVal: 22201 , totalTest: 55505
predTest = fModel.predict(x=testGen, steps=(totalTest // config.BATCH_SIZE)+1)
predTest = np.argmax(predTest, axis=1)

# Calculate roc auc
XGB_roc_value = roc_auc_score(testGen.classes, predTest)
print("XGboost roc_value: {0}" .format(XGB_roc_value))

# Plotting the graph
training_plot(MF, config.EPOCHS, config.PLOT_PATH)

Serializing a Model

When you are satisfied with the training you can save the model so it is ready for transfer:

# Serialize/Writing the model to disk
print("Serializing network...")
fModel.save(config.MODEL_PATH, save_format="h5")

Training Re-run

I changed the following variables:

utils/config.py

BATCH_SIZE = 24
INIT_LR = 1e-3
EPOCHS = 20

For the Custom Model I am still seeing small improvements in the later epochs - looking much better:

Epoch 18/20
8324/8325 [============================>.] - ETA: 0s - loss: 0.4561 - accuracy: 0.8652
Epoch 18: val_loss improved from 0.31292 to 0.31132, saving model to ./output/custom_weights.hdf5
8325/8325 [==============================] - 247s 30ms/step - loss: 0.4561 - accuracy: 0.8652 - val_loss: 0.3113 - val_accuracy: 0.8663

The ResNet50 Model also improves compared to the earlier run. But it still get's stuck around the 10th epoch and even gets much worse after that - given the fluctuation I probably should have let it run a bit longer:

Epoch 9/20
8323/8325 [============================>.] - ETA: 0s - loss: 0.8274 - accuracy: 0.7163
Epoch 9: val_loss improved from 0.60565 to 0.58112, saving model to ./output/resnet50_weights.hdf5
8325/8325 [==============================] - 249s 30ms/step - loss: 0.8274 - accuracy: 0.7163 - val_loss: 0.5811 - val_accuracy: 0.7097

Breast Histopathology Image Segmentation

I did another - much longer - run for the Custom Model and I am still seeing improvements up to the 54th epoch. Though the loss function still returns a slightly higher value as before:

Epoch 54/220
8324/8325 [============================>.] - ETA: 0s - loss: 0.4869 - accuracy: 0.8566
Epoch 54: val_loss improved from 0.37685 to 0.36609, saving model to ./output/custom_weights.hdf5
8325/8325 [==============================] - 222s 27ms/step - loss: 0.4869 - accuracy: 0.8566 - val_loss: 0.3661 - val_accuracy: 0.8482

Breast Histopathology Image Segmentation