Sneak peek at Model Serving as Code for PyTorch

Nov 30, 2022

Note: This blog post is part of my ongoing work on experiments with model training, deployment and monitoring repository bitbeast. If you liked this blog post, please upvote on Hacker News.

Source Code: GitHub

Rise of X as Code

Software Engineering is a combination of many levels of abstraction. Abstractions in levels of language, abstractions in code as APIs, environment, and more. Over the last decade, this race for building things easier, shorter and faster has led to several concepts:

Infrastructure as Code - Terraform is an example of setting up cloud infrastructures using code.
Data Pipelines as Code - Apache Airflow is an example of setting up DAGs and several custom workflows using code.
Environment as Code - Vagrant and Docker are famous examples for setting up your choice of an environment as code.

In Software 2.0, modern-day Machine Learning frameworks are the highest level of abstraction for underlying complex math. The rise of serving models requires some abstraction as well. Variety of frameworks for serving models are discussed in my last post. These frameworks usually need an understanding of complex concepts of software engineering and time.

Introducing TorchLego

TorchLego is a server for running inference using PyTorch models. Its inspired by the concept of X-as-Code. With TorchLego, one can define the preprocess, postprocess and PyTorch TorchScript module location as a config for execution. It is a very simplified version and very little similar to NVIDIA Triton Inference Server.

Note: TorchLego is currently in Alpha stage and expected to undergo lots of development to support several PyTorch operations and Tasks like NLP and Audio.

Model Serving as Code

Writing model configuration

Create a YAML file with model configuration as follows:

models:
  - name: model-name <- unique name/slug for the model
    download: http://download-link <- module download link
    gpu: false
    stages:
      input: file <- support for file upload as input while running inference
      preprocess:
        default: image_classification <- default torchvision transforms for preprocessing
  - name: custom-model-name
    download: http://download-link <- module download link
    gpu: false
    stages:
      input: file
      # custom pytorch transforms for preprocessing the input
      preprocess:
        resize: 299
        center_crop: 299
        to_tensor: true
        normalize:
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
        unsqueeze: 0

The model configuration defines a unique name for the model, a url for downloading the TorchScript modules and stages for preprocessing the input file.
The preprocessing stages can be custom defined or one can use the defaults PyTorch provides for Image Classification, Semantic Segmentation and Object Detection.

For example following Python code is transformed into YAML as follows:

preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

preprocess:
    resize: 299
    center_crop: 299
    to_tensor: true
    normalize:
        mean: [0.485, 0.456, 0.406]
        std: [0.229, 0.224, 0.225]
    unsqueeze: 0

Using TorchLego for serving models

Models can served for inferencing in Docker containers using TorchLego Docker Image. Pull the Docker Image using the command:

docker pull prabhuomkar/torchlego:<version>

Start the inference server using the command:

docker run --rm --net=host -v ${PWD}/examples:/model-config prabhuomkar/torchlego:<version>

Run the inference:

curl -X POST 'http://localhost:8080/v1/models/model-name' --form 'input=@"//file/examples/input.jpg"'

Note: the configuration for models and system go in the root folder of the container named model-config. That is the reason we are mounting a volume with folder containing the YAML configuration and .env configuration.

Get list of all models

curl -X GET 'http://localhost:8080/v1/models'

Possible Improvements / Roadmap

GPU Support: Currently, it does not have the capability for faster inference using GPU.
gRPC API: We aim to add gRPC API with users’ choice of client library.
Workflows: Support for Workflow configuration with details of the models to be executed and a DAG for defining data flow.

If you liked the idea of TorchLego, go check it out on GitHub. If you have an idea or a suggestion for improvement, feel free to contribute via Issues/Pull Requests!