Recurrent Neural Networks

Guangzhou, China

Get the Dataset
Data Preparation
Building the Model
- Defining the Loss Function
- Configure Checkpoints
Model Training
Make Predictions
- Restore the latest Checkpoint
- Prediction Loop

Generate text using a character-based RNN - see Tensorflow Tutorial. The dataset used is a set of Shakespeare's writing from Andrej Karpathy's The Unreasonable Effectiveness of Recurrent Neural Networks. The objective is to train an LSTM (Long Short Term Memory) network to predict the next character in a sequence of characters.

Get the Dataset

The Shakespeare excerpt we want to train our LSTM on is directly available from Tensorflow:

# load dataset
data_url = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

dataset_text = open(data_url, 'rb').read().decode(encoding='utf-8')

Data Preparation

Tensorflow is not going to work with characters but we need to provide an index that maps each character to a number. For this we first need to find out what unique characters are contained within the training material and then assign them ascending numbers in a numpy array:

# map text to numbers
## obtain the unique characters in the dataset
vocab = sorted(set(dataset_text))
## create a mapping from unique characters to indices
char2idx = {char:index for index, char in enumerate(vocab)}
idx2char = np.array(vocab)
# convert dataset from 'characters' to 'integers'
text_as_int = np.array([char2idx[char] for char in dataset_text])

We can now break up the dataset into sequences of 100 characters:

# breaking up data into sequences
SEQ_LENGTH = 100
## calculate number of examples per epoch for sequence length of 100 characters 
examples_per_epoch = len(dataset_text)//SEQ_LENGTH
## the dataset holds around 1 mio. characters
## so this generates around 10k sequences
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
sequences = char_dataset.batch(SEQ_LENGTH+1, drop_remainder=True)

The network is supposed to predict the following character not contained within a sequence. So if you have a 10 character sequence Hello Worl we want to to be able to predict the next character that is going to be a d.

For the training we are going to duplicate each sequence and shift it by 1 character - so that the training input set is missing a final character that is present in the target set. The network can then be trained on the input set and it's performance be verified by comparing it's prediction to the target set:

## duplicate each sequence and shift +1/-1 it to form the input and target text:
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

We can take a look at an excerpt from both sets:

## visualize dataset
for input_example, target_example in  dataset.take(1):
  print ('Input data: ', repr(''.join(idx2char[input_example.numpy()])))
  print ('Target data:', repr(''.join(idx2char[target_example.numpy()])))

Input data:  're too dear: the leanness that\nafflicts us, the object of our misery, is as an\ninventory to particul'
Target data: 'e too dear: the leanness that\nafflicts us, the object of our misery, is as an\ninventory to particula'

Before starting the training we should also shuffle the data and split it up into consumable batches:

## shuffle the dataset and create batches
BATCH_SIZE = 64
BUFFER_SIZE = 10000
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

Building the Model

We can use tf.keras.Sequential to define the model with the following three layers:

tf.keras.layers.Embedding: The input layer. A trainable lookup table that will map the numbers of each character to a vector with embedding_dim dimensions
tf.keras.layers.LSTM: Long Short Term Memory network
tf.keras.layers.Dense: The output layer, with vocab_size (number of unique characters) outputs

def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.GRU(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
  ])

  return model


model = build_model(
  vocab_size = len(vocab),
  embedding_dim=EMBED_DIM,
  rnn_units=RNN_UNITS,
  batch_size=BATCH_SIZE)

Defining the Loss Function

Before compiling the model we need to define a loss function that allows us to quantify the performance of our model:

def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

example_batch_loss  = loss(target_example_batch, example_batch_predictions)

model.compile(optimizer='adam', loss=loss)

Configure Checkpoints

Optionally create checkpoints:

# directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

Model Training

Now we can fit our model to the training data:

EPOCHS=10
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Make Predictions

Restore the latest Checkpoint

To keep this prediction step simple, use a batch size of 1. Because of the way the RNN state is passed from timestep to timestep, the model only accepts a fixed batch size once built. To run the model with a different batch_size, we need to rebuild the model and restore the weights from the checkpoint.

## locate latest checkpoint
checkpoint_dir = './training_checkpoints'
tf.train.latest_checkpoint(checkpoint_dir)


## load weights from checkpoint
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir)).expect_partial()
model.build(tf.TensorShape([1, None]))

Prediction Loop

Looking at the generated text, you'll see the model knows when to capitalize, make paragraphs and imitates a Shakespeare-like writing vocabulary. With the small number of training epochs, it has not yet learned to form coherent sentences:

# prediction loop
def generate_text(model, start_string):
  # evaluation step (generating text using the learned model)

  # number of characters to generate
  num_generate = 1000

  # converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty string to store our results
  text_generated = []

  # low lsd results in more predictable text.
  # higher lsd results in more surprising text.
  # experiment to find the best setting.
  lsd = 1.0

  # here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the word returned by the model
      predictions = predictions / lsd
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted word as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))


print(generate_text(model, start_string=u"ROMEO: "))

This results in the shakespearian screenplay:

ROMEO: the heavy liege,
I have forced to him by him? what of right worship by her furied blood;
Therefore we do well, that alive Upon your deep spack
There I have found'd Aumerle.

WARWICK: I'll cun yield.

GREMIO: The time have done the truth! do you not till come.

VONUMNIA: I think, his wrong. But then; ig! Would I
must, gentlemen, dies thee to my body.
O, behold my chamber:
Go, bight out some nature fiest their perbary; he kill'd
from the meat complaing of my poor hand; as much.

QUEEN MARGARD: My Lord of Norfolk. Kate, I hear,
Were not the child if you fall,
Under'd by a child, on Warwick was even, Angelo?
He do one kiss how awaken home. Can altigal my love to propership all these
Your ewemst to speak.

First Citizen: Out, my got you sick and sly by his purpose.

KING RICHARD II: Then shall with virgue Of Edward withstarks, brotherss
Of the madry thuse her black full of 'twas news
Tyat will put it not.

JULIET: Thou art a dester.

FLORIZEL: I thought with senses
Away.

Reducing the amount of LSD consumed by the algorithm by a factor of 10 results in a much less confusing scene:

KATHERINE: the duke is not the prince's death,
And therefore for the people, the death of the market-place.

KING RICHARD II:
What means the death of the people,
And therefore be so much to my fortune and the prouderous bones,
And shall be so far off the market-place.

CLARENCE:
The matter of the proudest of the people,
And therefore for the people, the prouder than the season be a poor boy.

KING RICHARD II:
What means the death of the market-place;
And therefore be so soon as I say.

KING RICHARD II:
What means the season be a poor princely father's son,
The shame of the people, the present peace
With the people and the heart of the people,
And therefore be so far off the death of the heart.

KING RICHARD II:
What man that the matter?

CORIOLANUS:
What means the day of the market-place.

CLARENCE:
The matter of the people and the bear me but the season be a proudly sentence
That we shall be here all the world is dead.

KING RICHARD II:
What man that we shall be so far off for the people

Get the Dataset​

Data Preparation​

Building the Model​

Defining the Loss Function​

Configure Checkpoints​

Model Training​

Make Predictions​

Restore the latest Checkpoint​

Prediction Loop​