Skip to main content

Distributed training with TensorFlow

Guangzhou, China

Github Repository

tf.distribute.Strategy is a TensorFlow API to distribute training across multiple GPUs, multiple machines, or TPUs. You can use tf.distribute.Strategy with very few changes to your code, because the underlying components of TensorFlow have been changed to become strategy-aware. This includes variables, layers, models, optimizers, metrics, summaries, and checkpoints.

Types of Strategies

TensorFlow has the following strategy options:

  • MirroredStrategy
  • TPUStrategy
  • MultiWorkerMirroredStrategy
  • ParameterServerStrategy
  • CentralStorageStrategy


tf.distribute.MirroredStrategy supports synchronous distributed training on multiple GPUs on one machine. It creates one replica per GPU device. Each variable in the model is mirrored across all the replicas. Together, these variables form a single conceptual variable called MirroredVariable. Efficient all-reduce algorithms are used to communicate the variable updates across the devices.

mirrored_strategy = tf.distribute.MirroredStrategy()

This will create a MirroredStrategy instance, which will use all the GPUs that are visible to TensorFlow, and NCCL—as the cross-device communication.

If you wish to use only some of the GPUs on your machine, you can do so like this:

mirrored_strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"])


tf.distribute.TPUStrategy lets you run your TensorFlow training on Tensor Processing Units (TPUs). TPUs are Google's specialized ASICs designed to dramatically accelerate machine learning workloads. In terms of distributed training architecture, TPUStrategy is the same MirroredStrategy—it implements synchronous distributed training. TPUs provide their own implementation of efficient all-reduce and other collective operations across multiple TPU cores, which are used in TPUStrategy.


tf.distribute.MultiWorkerMirroredStrategy is very similar to MirroredStrategy. It implements synchronous distributed training across multiple workers, each with potentially multiple GPUs. Similar to tf.distribute.MirroredStrategy, it creates copies of all variables in the model on each device across all workers.


tf.distribute.experimental.CentralStorageStrategy does synchronous training as well. Variables are not mirrored, instead they are placed on the CPU and operations are replicated across all local GPUs. If there is only one GPU, all variables and operations will be placed on that GPU.


I am going to use my previous test and add the mirrored strategy to be able to distribute the training workload over multiple GPUs:

strategy = tf.distribute.MirroredStrategy()

with strategy.scope():
classifier = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, (3,3), activation = 'relu', input_shape = (28,28,1)),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.Dense(64, activation = 'relu'),
tf.keras.layers.Dense(10, activation = 'softmax')

