Skip to main content

DVC Model Access

Guangzhou, China

We've learned how to track data and models with DVC, and how to commit their versions to Git. The next questions are: How can we use these artifacts outside of the project? How do we download a model to deploy it? How to download a specific version of a model? Or reuse datasets across different projects?

Data and Model Access

DVC's remote storage config is also saved in Git, and contains all the information needed to access and download any version of datasets, files, and models. It means that a Git repository with DVC files becomes an entry point, and can be used instead of accessing files directly.

Find a file or Directory

You can use dvc list to explore a DVC repository hosted on any Git server:

dvc list https://github.com/mpolinowski/dvc-demo-project.git
.dvcignore
data

The benefit of this command over browsing a Git hosting website is that the list includes files and directories tracked by both Git and DVC.

Download

One way is to simply download the data with dvc get. This is useful when working outside of a DVC project environment, for example in an automated ML model deployment task:

dvc get https://github.com/mpolinowski/dvc-demo-project \
data

And now to the magic part - while the Git repository only contains the .dvc configuration file that points to our data:

Data Version Control Model Access

The GET command we used above automatically pulled the data with the version that was committed to Git:

ls -la data

256 Jan 5 17:25 .
232 Jan 5 17:25 ..
14445097 Jan 5 17:25 data.xml
10 Jan 5 17:25 .gitignore

Data Pipelines

WiP