Tensorflow Serving API

Guangzhou, China

Starting the Model Server
Inference Server
- REST API
Serving Multiple Models

Github Repository

I already looked into the official Tensorflow Docker Model Server and managed to get it to work with CUDA support. The next step is to use this container to serve my own Keras-trained model.

Starting the Model Server

Export your Tensorflow model in the saved_model format and point the tensorflow-serving container to model path, e.g.:

docker run --gpus all -p 8501:8501 -p 8500:8500 --name tf-model-server \
--mount type=bind,source=$(pwd)/saved_model,target=/models \
-e MODEL_NAME=efficientv2b0_model -t tensorflow/serving:latest-gpu

The container uses the port 8500 for the GRPC api and port 8501 for the REST api.

ERROR message: W tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:252] No versions of servable efficientv2b0_model found under base path /models/efficientv2b0_model. Did you forget to name your leaf directory as a number (eg. '/1/')?

The server expects the model path to have numbered sub directories - create a subdir inside the saved model directory named 1 and copy everything that Tensorflow saved into it. Subsequently, every time you retrain the model place the updated files in folders with incrementing numbers. The model server will automatically switch to the latest version for you.

saved_model tree -L 2
.
├── efficientv2b0_model
│   └── 1
│       ├── assets
│       ├── fingerprint.pb
│       ├── keras_metadata.pb
│       ├── saved_model.pb
│       └── variables

Verifying MetaGraph and Signature Definitions using saved_model_cli utility:

saved_model_cli show --dir ./saved_model/efficientv2b0_model/1/ --tag_set serve --signature_def serving_default

The given SavedModel SignatureDef contains the following input(s):
  inputs['input_2'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 224, 224, 3)
      name: serving_default_input_2:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['dense_2'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 48)
      name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict

Now the server starts successfully:

I tensorflow_serving/core/loader_harness.cc:95] Successfully loaded servable version {name: efficientv2b0_model version: 1}
I tensorflow_serving/model_servers/server_core.cc:486] Finished adding/updating models
I tensorflow_serving/model_servers/server.cc:118] Using InsecureServerCredentials
I tensorflow_serving/model_servers/server.cc:383] Profiler service is enabled
I tensorflow_serving/model_servers/server.cc:409] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
I tensorflow_serving/model_servers/server.cc:430] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...

You can verify the REST API by retrieving your models metadata:

curl http://localhost:8501/v1/models/efficientv2b0_model/metadata

{
"model_spec":{
 "name": "efficientv2b0_model",
 "signature_name": "",
 "version": "1"
}
...

Inference Server

REST API

For testing I will just start a simple python container and install my dependencies manually:

docker run --rm -ti --network host \
--mount type=bind,source=$(pwd)/api_request,target=/opt/app \
python:alpine /bin/ash

pip install pillow requests numpy

/opt/app/api_request/rest_request.py

import json
import numpy as np
from PIL import Image
import requests


labels = ['Gladiolus', 'Adenium', 'Alpinia_Purpurata', 'Alstroemeria', 'Amaryllis', 'Anthurium_Andraeanum', 'Antirrhinum', 'Aquilegia', 'Billbergia_Pyramidalis', 'Cattleya', 'Cirsium', 'Coccinia_Grandis', 'Crocus', 'Cyclamen', 'Dahlia', 'Datura_Metel', 'Dianthus_Barbatus', 'Digitalis', 'Echinacea_Purpurea', 'Echinops_Bannaticus', 'Fritillaria_Meleagris', 'Gaura', 'Gazania', 'Gerbera', 'Guzmania', 'Helianthus_Annuus', 'Iris_Pseudacorus', 'Leucanthemum', 'Malvaceae', 'Narcissus_Pseudonarcissus', 'Nerine', 'Nymphaea_Tetragona', 'Paphiopedilum', 'Passiflora', 'Pelargonium', 'Petunia', 'Platycodon_Grandiflorus', 'Plumeria', 'Poinsettia', 'Primula', 'Protea_Cynaroides', 'Rose', 'Rudbeckia', 'Strelitzia_Reginae', 'Tropaeolum_Majus', 'Tussilago', 'Viola', 'Zantedeschia_Aethiopica']


url1 = 'http://localhost:8501/v1/models/efficientv2b0_model:predict'


test_img1 = "/opt/app/snapshots/Viola_Tricolor.jpg"
test_img2 = "/opt/app/snapshots/Water_Lilly.jpg"
test_img3 = "/opt/app/snapshots/Strelitzia.jpg"


with Image.open (test_img2) as im:
        preprocess_img = im.resize((224, 224))

batched_img = np.expand_dims(preprocess_img, axis=0)
batched_img = np.float32(batched_img)

data = json.dumps(
    {"signature_name": "serving_default", "instances": batched_img.tolist()}
)


def predict_rest(json_data, url):
    json_response = requests.post(url, data=json_data)
    response = json.loads(json_response.text)
    rest_outputs = np.array(response["predictions"])
    return rest_outputs


# get prediction from efficientv2b0_model
rest_outputs = predict_rest(data, url1)
index = np.argmax(rest_outputs, axis=-1)[0]  # Index with highest prediction

print("Prediction Results: EfficientV2B0")
print("Class probabilities: ", rest_outputs)
print("Predicted class: ", labels[index])

Executing the API request script /opt/app/api_request/rest_request.py inside the container will send 1 of 3 test images to the Tensorflow model API to retrieve a prediction:

python /opt/app/rest_request.py

Class probabilities:  [[2.02370361e-13 5.45808624e-12 3.14568647e-17 4.50543422e-11
74268600e-09 2.22335952e-12 5.15965439e-12 2.28333991e-10
17855503e-18 3.61456546e-12 1.40493947e-17 1.46841839e-09
42843321e-13 2.59899831e-16 2.68869540e-12 1.53930095e-08
36200578e-12 6.06594810e-16 2.21194929e-14 5.79839779e-17
05216942e-12 6.55278443e-10 2.30210545e-13 6.22206000e-15
16498033e-16 1.86334712e-15 7.34451477e-09 9.92521278e-13
40660292e-08 5.47506651e-10 3.36575397e-16 1.56563315e-12
54165000e-09 4.07618221e-13 1.69515952e-05 1.08003778e-05
42027980e-08 1.65058089e-09 1.25125591e-13 4.95898966e-09
62804418e-16 5.25978046e-17 1.91704139e-14 2.93358880e-18
04848768e-08 1.63559369e-14 9.99972224e-01 2.25344784e-10]]

Predicted class:  Viola

Serving Multiple Models

Ok, with this working I want to configure the Model Server to serve all the trained models.

tree -L 2 saved_model

saved_model
├── deit_model
│   └── 1
├── efficients_model
│   └── 1
├── efficientv2b0_model
│   └── 1
├── inception_model_model_ft
│   └── 1
├── mobilenet2_model_ft
│   └── 1
├── mobilenetv3L_model_ft
│   └── 1
├── mobilenetv3S_model
│   └── 1
├── nasnetmobile_model_ft
│   └── 1
├── vit_model
│   └── 1
└── xception_model
    └── 1

For this we have to add a models.config file inside the models container. The configuration file can then be added by adding the following flags (the automatic reload is optional):

docker run -t --rm -p 8501:8501 --name tf-serve \
    --mount type=bind,source=$(pwd)/saved_model,target=/models \
    tensorflow/serving:latest-gpu \
    --model_config_file=/models/models.config \
    --model_config_file_poll_wait_seconds=60

./saved_model/models.config

model_config_list {
  config {
    name: 'deit_model'
    base_path: '/models/deit_model/'
    model_platform: 'tensorflow'
  }
  config {
    name: 'efficients_model'
    base_path: '/models/efficients_model/'
    model_platform: 'tensorflow'
  }
  config {
    name: 'efficientv2b0_model'
    base_path: '/models/efficientv2b0_model/'
    model_platform: 'tensorflow'
  }
  config {
    name: 'inception_model_model_ft'
    base_path: '/models/inception_model_model_ft/'
    model_platform: 'tensorflow'
  }
  config {
    name: 'mobilenet2_model_ft'
    base_path: '/models/mobilenet2_model_ft/'
    model_platform: 'tensorflow'
  }
  config {
    name: 'mobilenetv3L_model_ft'
    base_path: '/models/mobilenetv3L_model_ft/'
    model_platform: 'tensorflow'
  }
  config {
    name: 'mobilenetv3S_model'
    base_path: '/models/mobilenetv3S_model/'
    model_platform: 'tensorflow'
  }
  config {
    name: 'nasnetmobile_model_ft'
    base_path: '/models/nasnetmobile_model_ft/'
    model_platform: 'tensorflow'
  }
  config {
    name: 'vit_model'
    base_path: '/models/vit_model/'
    model_platform: 'tensorflow'
  }
  config {
    name: 'xception_model'
    base_path: '/models/xception_model/'
    model_platform: 'tensorflow'
  }
}

Starting up the container I can now see that Tensorflow is reloading all three models in a 60s interval:

tensorflow_serving/model_servers/server.cc:430] Exporting HTTP/REST API at:localhost:8501 ...
tensorflow_serving/model_servers/server_core.cc:465] Adding/updating models.
tensorflow_serving/model_servers/server_core.cc:594]  (Re-)adding model: efficientv2b0_model
tensorflow_serving/model_servers/server_core.cc:594]  (Re-)adding model: mobilenetv3S_model
tensorflow_serving/model_servers/server_core.cc:594]  (Re-)adding model: vit_model
tensorflow_serving/model_servers/server_core.cc:486] Finished adding/updating models

I now added the URL for all three models to the Python request script:

url1 = 'http://localhost:8501/v1/models/efficientv2b0_model:predict'
url2 = 'http://localhost:8501/v1/models/mobilenetv3S_model:predict'
url3 = 'http://localhost:8501/v1/models/vit_model:predict'

./saved_model/models.config

import json
import numpy as np
from PIL import Image
import requests


labels = ['Gladiolus', 'Adenium', 'Alpinia_Purpurata', 'Alstroemeria', 'Amaryllis', 'Anthurium_Andraeanum', 'Antirrhinum', 'Aquilegia', 'Billbergia_Pyramidalis', 'Cattleya', 'Cirsium', 'Coccinia_Grandis', 'Crocus', 'Cyclamen', 'Dahlia', 'Datura_Metel', 'Dianthus_Barbatus', 'Digitalis', 'Echinacea_Purpurea', 'Echinops_Bannaticus', 'Fritillaria_Meleagris', 'Gaura', 'Gazania', 'Gerbera', 'Guzmania', 'Helianthus_Annuus', 'Iris_Pseudacorus', 'Leucanthemum', 'Malvaceae', 'Narcissus_Pseudonarcissus', 'Nerine', 'Nymphaea_Tetragona', 'Paphiopedilum', 'Passiflora', 'Pelargonium', 'Petunia', 'Platycodon_Grandiflorus', 'Plumeria', 'Poinsettia', 'Primula', 'Protea_Cynaroides', 'Rose', 'Rudbeckia', 'Strelitzia_Reginae', 'Tropaeolum_Majus', 'Tussilago', 'Viola', 'Zantedeschia_Aethiopica']


url1 = 'http://localhost:8501/v1/models/efficientv2b0_model:predict'
url2 = 'http://localhost:8501/v1/models/mobilenetv3S_model:predict'
url3 = 'http://localhost:8501/v1/models/vit_model:predict'


test_img1 = "/opt/app/snapshots/Viola_Tricolor.jpg"
test_img2 = "/opt/app/snapshots/Water_Lilly.jpg"
test_img3 = "/opt/app/snapshots/Strelitzia.jpg"


with Image.open (test_img2) as im:
        preprocess_img = im.resize((224, 224))

batched_img = np.expand_dims(preprocess_img, axis=0)
batched_img = np.float32(batched_img)

data = json.dumps(
    {"signature_name": "serving_default", "instances": batched_img.tolist()}
)


def predict_rest(json_data, url):
    json_response = requests.post(url, data=json_data)
    response = json.loads(json_response.text)
    rest_outputs = np.array(response["predictions"])
    return rest_outputs


# get prediction from efficientv2b0_model
rest_outputs = predict_rest(data, url1)
index = np.argmax(rest_outputs, axis=-1)[0]  # Index with highest prediction

print("Prediction Results: EfficientV2B0")
print("Class probabilities: ", rest_outputs)
print("Predicted class: ", labels[index])
percentage = round((rest_outputs[0][index]*100), 3)
print(f'Certainty:  {percentage} %')


# get prediction from mobilenetv3S_model
rest_outputs = predict_rest(data, url2)
index = np.argmax(rest_outputs, axis=-1)[0]  # Index with highest prediction

print("Prediction Results: MobileNetV3S")
print("Class probabilities: ", rest_outputs)
print("Predicted class: ", labels[index])
percentage = round((rest_outputs[0][index]*100), 3)
print(f'Certainty:  {percentage} %')


# get prediction from vit_model
rest_outputs = predict_rest(data, url3)
index = np.argmax(rest_outputs, axis=-1)[0]  # Index with highest prediction

print("Prediction Results: ViT")
print("Class probabilities: ", rest_outputs)
print("Predicted class: ", labels[index])
percentage = round((rest_outputs[0][index]*100), 3)
print(f'Certainty:  {percentage} %')

That will now return 3 predictions:

python /opt/app/rest_request.py
Prediction Results: EfficientV2B0
Class probabilities:  [[1.27231669e-18 7.36642785e-15 2.12142088e-16 8.37840160e-13
54633266e-15 2.23082670e-22 1.22582740e-17 1.58766519e-16
15969443e-21 3.40760905e-12 9.31879706e-21 1.35364190e-16
19998346e-13 6.28031038e-19 1.42876893e-08 1.52733778e-16
71126649e-18 6.26449727e-18 1.70084369e-22 5.93363685e-27
35457736e-23 9.82926604e-26 1.07540425e-15 1.03456081e-16
33486490e-14 1.70107328e-19 1.25875951e-20 1.54503871e-19
05770212e-19 9.31224634e-16 2.43002143e-25 1.00000000e+00
49300737e-20 6.64273082e-17 4.00534170e-18 3.18333764e-19
38794318e-24 5.08237766e-13 4.06667683e-19 4.50689589e-13
09000394e-16 6.34139226e-13 2.21711468e-24 3.38089155e-23
83935487e-19 3.32891393e-19 1.46283768e-16 3.42905371e-23]]
Predicted class:  Nymphaea_Tetragona
Certainty:  100.0 %
Prediction Results: MobileNetV3S
Class probabilities:  [[6.27168000e-08 9.36711274e-07 3.32008640e-05 1.82103206e-04
65090000e-05 7.08905601e-10 5.29715000e-09 2.18803660e-08
43549421e-08 2.40992620e-07 2.09935107e-12 9.32755886e-11
55253754e-10 2.58531685e-08 1.72480277e-03 9.44796508e-09
51912500e-12 3.97989908e-07 4.73708963e-13 2.97169041e-14
57825137e-14 4.23965169e-11 4.12751433e-07 1.92947700e-05
95965513e-06 5.97457550e-09 4.81428591e-13 3.20082150e-13
89814697e-09 9.56469748e-09 3.24247695e-09 9.97930884e-01
90472593e-09 2.25990516e-06 2.97242941e-09 4.48806965e-08
23452157e-12 5.94276535e-05 3.16433564e-08 3.98971480e-07
16912586e-08 8.35711322e-09 1.56445000e-12 1.42842169e-10
86222768e-10 7.43138450e-12 1.27389072e-10 1.44366144e-10]]
Predicted class:  Nymphaea_Tetragona
Certainty:  99.793 %
Prediction Results: ViT
Class probabilities:  [[2.62611400e-04 9.45560227e-04 7.97024090e-03 2.50866893e-03
62246714e-04 9.96018527e-04 5.78884617e-04 1.15711347e-03
87621685e-03 2.56323745e-03 1.19275635e-03 5.13695000e-04
98167782e-04 4.11458139e-04 1.77495480e-02 3.71844682e-04
45975481e-04 1.64183730e-04 1.62366749e-04 4.10321372e-04
85561967e-04 4.59756848e-04 7.18721712e-04 2.03839969e-03
18398985e-03 8.30425473e-04 5.62683621e-04 1.05744123e-03
08664425e-03 8.36106890e-04 4.69557708e-04 9.25359428e-01
82242860e-04 8.19175097e-04 4.58333000e-04 2.90713477e-04
36424108e-04 8.55224300e-03 6.25506684e-04 9.37757781e-04
16826578e-04 4.17304225e-03 5.67917000e-04 4.71120235e-04
65961187e-04 7.77638000e-04 1.47661043e-03 7.18727824e-04]]
Predicted class:  Nymphaea_Tetragona
Certainty:  92.536 %

Starting the Model Server​

Inference Server​

REST API​

Serving Multiple Models​

Starting the Model Server

Inference Server

REST API

Serving Multiple Models