Skip to main content

MLflow 2.1 Introduction

TST, HongKong

MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment and a central model registry. MLflow currently offers four components:

  • MLflow Tracking: Record and query experiments: code, data, config, and results
  • MLflow Projects: Package data science code in a format to reproduce runs on any platform
  • MLflow Models: Deploy machine learning models in diverse serving environments
  • Model Registry: Store, annotate, discover, and manage models in a central repository

Installing MLflow (PIP)

# Install MLflow
pip install mlflow
# Install MLflow with extra ML libraries and 3rd-party tools
pip install 'mlflow[extras]'
# Install a lightweight version of MLflow
pip install mlflow-skinny

To use certain MLflow modules and functionality (ML model persistence/inference, artifact storage options, etc), you may need to install extra libraries. For example, the mlflow.tensorflow module requires TensorFlow to be installed. See Extra Dependencies for more details.

When using MLflow skinny, you may need to install additional dependencies if you wish to use certain MLflow modules and functionalities. For example, usage of SQL-based storage for MLflow Tracking (e.g. mlflow.set_tracking_uri("sqlite:///my.db")) requires pip install mlflow-skinny sqlalchemy alembic sqlparse. If using MLflow skinny for serving, a minimally functional installation would require pip install mlflow-skinny flask`.

The MLFlow installation with extras contains the following:

Successfully installed Mako-1.2.4 PyJWT-2.6.0 PySocks-1.7.1 SecretStorage-3.3.3 adal-1.2.7 aiokafka-0.8.0 alembic-1.9.3 azure-common-1.1.28 azure-core-1.26.3 azure-graphrbac-0.61.1 azure-mgmt-authorization-3.0.0 azure-mgmt-containerregistry-10.0.0 azure-mgmt-core-1.3.2 azure-mgmt-keyvault-10.1.0 azure-mgmt-resource-21.2.1 azure-mgmt-storage-20.1.0 azureml-core-1.48.0 backports.tempfile-1.0 backports.weakref-1.0.post1 bcrypt-4.0.1 cryptography-38.0.4 databricks-cli-0.17.4 docker-6.0.1 fastapi-0.88.0 gevent-22.10.2 geventhttpclient-2.0.2 google-cloud-core-2.3.2 google-cloud-storage-2.7.0 google-crc32c-1.5.0 google-resumable-media-2.4.1 gunicorn-20.1.0 isodate-0.6.1 jeepney-0.8.0 jsonpickle-2.2.0 kafka-python-2.0.2 knack-0.10.1 kubernetes-25.3.0 llvmlite-0.39.1 mlflow-2.1.1 mlserver-1.3.0.dev2 mlserver-mlflow-1.3.0.dev2 msal-1.21.0 msal-extensions-1.0.0 msrest-0.7.1 msrestazure-0.6.4 ndg-httpsclient-0.5.1 numba-0.56.4 numpy-1.23.5 orjson-3.8.5 paramiko-2.12.0 pkginfo-1.9.6 portalocker-2.7.0 prometheus-flask-exporter-0.21.0 py-grpc-prometheus-0.7.0 pynacl-1.5.0 pyopenssl-22.1.0 pysftp-0.2.9 python-dotenv-0.21.1 python-rapidjson-1.9 querystring-parser-1.2.4 requests-auth-aws-sigv4-0.7 shap-0.41.0 slicer-0.0.7 sqlalchemy-1.4.46 sqlparse-0.4.3 starlette-exporter-0.15.1 tritonclient-2.30.0 uvloop-0.17.0 zope.event-4.6 zope.interface-5.5.2

Downloading the Quickstart

Download the quickstart code by cloning MLflow via and cd into the examples subdirectory of the repository. We’ll use this working directory for running the quickstart:

git clone https://github.com/mlflow/mlflow
cd mlflow/examples

Using the Tracking API

Viewing the Tracking UI

By default, wherever you run your program, the tracking API writes data into files into a local ./mlruns directory. You can then run MLflow’s Tracking UI on http://localhost:5000:

mlflow ui
python quickstart/mlflow_tracking.py

MLFlow Introduction

Running MLflow Projects

MLflow allows you to package code and its dependencies as a project that can be run in a reproducible fashion on other data. Each project includes its code and a MLproject file that defines its dependencies (for example, Python environment) as well as what commands can be run into the project and what arguments they take.

You can run existing projects with the mlflow run command, which runs a project from either a local directory:

mlflow run sklearn_elasticnet_wine -P alpha=0.5

or a GitHub URI:

mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=5.0

There’s a sample project in tutorial, including a MLproject file that specifies its dependencies. If you haven’t configured a tracking server yet, projects log their Tracking API data in the local mlruns directory so you can see these runs using mlflow ui.

If I try to run mlflow run sklearn_elasticnet_wine -P alpha=0.5 I get an error message FileNotFoundError: [Errno 2] No such file or directory: 'pyenv'. It seems that MLFlow expected to find a virtual environment with dependencies installed. The directory contains a python_env.yml file - I assume that all those dependencies would have been installed automatically if a virtEnv would have been present? Ah ok, there is the info:

By default mlflow run installs all dependencies using virtualenv. To run a project without using virtualenv, you can provide the --env-manager=local option to mlflow run. In this case, you must ensure that the necessary dependencies are already installed in your Python environment.

mlflow run --env-manager=local sklearn_elasticnet_wine -P alpha=0.5
Elasticnet model (alpha=0.500000, l1_ratio=0.100000):
RMSE: 0.7460550348172179
MAE: 0.576381895873763
R2: 0.21136606570632266
2023/02/09 13:32:42 INFO mlflow.projects: === Run (ID '886087767c9646b9824ec9315e63ac84') succeeded ===

The run statistics will be stored in the mlruns dir inside the examples dir that we started this training from. So we can open the UI from here mlflow ui and should be able to see a new entry:

MLFlow Introduction

After the run MLFlow generated a a requirements.txt and python_env.yml file - based on my local environment:

mlruns/0/886087767c9646b9824ec9315e63ac84/artifacts/model/python_env.yaml

python: 3.10.9
build_dependencies:
- pip==22.3.1
- setuptools==66.1.1
- wheel==0.38.4
dependencies:
- -r requirements.txt

mlruns/0/886087767c9646b9824ec9315e63ac84/artifacts/model/requirements.txt

mlflow<3,>=2.1
cloudpickle==2.2.0
psutil==5.9.4
scikit-learn==1.2.0
typing-extensions==4.4.0

And also saved the pickled artifacts from our training:

MLFlow Introduction

Saving and Serving Models

MLflow includes a generic MLmodel format for saving models from a variety of tools in diverse flavors. For example, many models can be served as Python functions, so an MLmodel file can declare how each model should be interpreted as a Python function in order to let various tools serve it. MLflow also includes tools for running such models locally and exporting them to Docker containers or commercial serving platforms.

To illustrate this functionality, the mlflow.sklearn package can log scikit-learn models as MLflow artifacts and then load them again for serving. There is an example training application in sklearn_logistic_regression/train.py that you can run as follows:

mlflow run --env-manager=local sklearn_logistic_regression

When you run the example, it outputs an MLflow run ID for that experiment. If you look at mlflow ui, you will also see that the run saved a model folder containing an MLmodel description file and a pickled scikit-learn model:

=== Running command 'python train.py' in run with ID 'ee566cd8d86b411f978c6e3db5d161cb' ===
Score: 0.6666666666666666
Model saved in run ee566cd8d86b411f978c6e3db5d161cb
=== Run (ID 'ee566cd8d86b411f978c6e3db5d161cb') succeeded ===

MLFlow Introduction

You can pass the run ID and the path of the model within the artifacts directory (here “model”) to various tools. For example, MLflow includes a simple REST server for python-based models - mlflow models serve -m runs:/<RUN_ID>/model:

mlflow models serve --env-manager=local -m runs:/ee566cd8d86b411f978c6e3db5d161cb/model --port 8080

The REST API defines 4 endpoints:

  • /ping: used for health check
  • /health: (same as /ping)
  • /version: used for getting the mlflow version
  • /invocations: used for scoring

Once you have started the server, you can pass it some sample data and see the predictions:

curl -d '{"dataframe_split": {"columns": ["x"], "data": [[1], [-1]]}}' -H 'Content-Type: application/json' -X POST localhost:8080/invocations

This returns:

{"predictions": [1, 0]}