Skip to main content

Breast Histopathology Image Segmentation Part 4

Guangzhou, China

Github

Based on Breast Histopathology Images by Paul Mooney. Invasive Ductal Carcinoma (IDC) is the most common subtype of all breast cancers. To assign an aggressiveness grade to a whole mount sample, pathologists typically focus on the regions which contain the IDC. As a result, one of the common pre-processing steps for automatic aggressiveness grading is to delineate the exact regions of IDC inside of a whole mount slide. Can recurring breast cancer be spotted with AI tech? - BBC News

Model Compilation

  • Optimizer: An optimizer is a function that modifies e.g. weights and learning rates to help minimizing the loss function with each epoch. The default optimizer to use is Adam (Adaptive Moment Estimation).
  • Loss Function: A loss function is a method to evaluate how well an algorithm models a given dataset. If predictions deviate too much from actual results it will return a large number. The goal of learning epochs is it to minimize the outcome of the selected loss function. The Binary Cross-Entropy function compares the predicted probabilities to the actual class - 0 or 1. Which in our case is either benign or malignant.

ResNet50 Model

./train_ResNet50_32_20k.py

# Compiling the model
## Decay updates the learning rate by a decreasing factor in each epoch
print("Compiling model")
opt = Adam(learning_rate=config.INIT_LR, decay=config.INIT_LR / config.EPOCHS)
model.compile(loss="binary_crossentropy", optimizer=opt, metrics=["accuracy"])

Custom CNN Model

./train_CustomModel_32_conv_20k.py

# Compiling the model
## Decay updates the learning rate by a decreasing factor in each epoch
print("Compiling the model")
opt = Adam(learning_rate=config.INIT_LR, decay=config.INIT_LR / config.EPOCHS)
model.compile(loss="binary_crossentropy", optimizer=opt, metrics=["accuracy"])

Model Checkpoints

Model checkpoints are callbacks to save the Keras model or model weights in a given interval. Those can be loaded later to continue the training from this saved state. Configuration options are:

  • Save all epoch checkpoints or only hold on to the latest best result.
  • Save after each epoch or only after a fixed number of training batches
  • Save the entire model or only it's weights

./train_ResNet50_32_20k.py

# Using ModelCheckpoint to store the best performing model based on val_loss
MCName = os.path.sep.join([config.OUTPUT_PATH, "resnet50_weights-{epoch:03d}-{val_loss:.4f}.hdf5"])
checkpoint = ModelCheckpoint(MCName, monitor="val_loss", mode="min", save_best_only=True, verbose=1)
callbacks = [checkpoint]

./train_CustomModel_32_conv_20k.py

# Using ModelCheckpoint to store the best performing model based on val_loss
MCName = os.path.sep.join([config.OUTPUT_PATH, "custom_weights-{epoch:03d}-{val_loss:.4f}.hdf5"])
checkpoint = ModelCheckpoint(MCName, monitor="val_loss", mode="min", save_best_only=True, verbose=1)
callbacks = [checkpoint]

Model Fitting

An Epoch refers to the number of passes the algorithm has made over the entire training dataset. The dataset is divided into small portions and not processed all at once. The amount of samples passing through the neural net at the same time is the Batch Size.

./train_ResNet50_32_20k.py ./train_CustomModel_32_conv_20k.py

# Fitting the model on training data
print("Model Fitting")
MF = model.fit(
x=trainGen,
steps_per_epoch=totalTrain // config.BATCH_SIZE,
validation_data=valGen,
validation_steps=totalVal // config.BATCH_SIZE,
class_weight=classWeight,
callbacks=callbacks,
epochs=config.EPOCHS)

ResNet50 Model

pipenv run python ./train_ResNet50_32_20k.py

I kept the training running over night. But already after 25 epochs I could not see any improvements:

Epoch 25/60
6243/6244 [============================>.] - ETA: 0s - loss: 0.9039 - accuracy: 0.6454
Epoch 25: val_loss improved from 0.63957 to 0.63328, saving model to ./output/weights-025-0.6333.hdf5
6244/6244 [==============================] - 215s 34ms/step - loss: 0.9039 - accuracy: 0.6454 - val_loss: 0.6333 - val_accuracy: 0.6313

Hmm that is interesting - I am coding along someone elses solution and I can see that my loss is a lot higher even though we are using the same data and augmentations - this training already reached a minimum after the 10th epoch:

Epoch 10: val_loss did not improve from 0.39576
7995/7995 [==============================] - 410s 51ms/step - loss: 0.5612 - accuracy: 0.8289 - val_loss: 0.4257 - val_accuracy: 0.8085

Custom Model

I now limit the number of epochs for the custom model to 10:

pipenv run python ./train_CustomModel_32_conv_20k.ipynb

And given that most of the epochs before the last still showed an improvement - i assume that this is not yet the minimum:

Epoch 10/10
6243/6244 [============================>.] - ETA: 0s - loss: 0.5423 - accuracy: 0.8398
Epoch 10: val_loss did not improve from 0.42444
6244/6244 [==============================] - 223s 36ms/step - loss: 0.5423 - accuracy: 0.8398 - val_loss: 0.4738 - val_accuracy: 0.8039