Sneak peek at Model Serving as Code for PyTorch

Nov 30, 2022

Note: This blog post is part of my ongoing work on experiments with model training, deployment and monitoring repository bitbeast. If you liked this blog post, please upvote on Hacker News.

Source Code: GitHub

Rise of X as Code

Software Engineering is a combination of many levels of abstraction. Abstractions in levels of language, abstractions in code as APIs, environment, and more. Over the last decade, this race for building things easier, shorter and faster has led to several concepts:

In Software 2.0, modern-day Machine Learning frameworks are the highest level of abstraction for underlying complex math. The rise of serving models requires some abstraction as well. Variety of frameworks for serving models are discussed in my last post. These frameworks usually need an understanding of complex concepts of software engineering and time.

Introducing TorchLego

TorchLego

TorchLego is a server for running inference using PyTorch models. Its inspired by the concept of X-as-Code. With TorchLego, one can define the preprocess, postprocess and PyTorch TorchScript module location as a config for execution. It is a very simplified version and very little similar to NVIDIA Triton Inference Server.

Note: TorchLego is currently in Alpha stage and expected to undergo lots of development to support several PyTorch operations and Tasks like NLP and Audio.


Model Serving as Code

Writing model configuration

models:
  - name: model-name <- unique name/slug for the model
    download: http://download-link <- module download link
    gpu: false
    stages:
      input: file <- support for file upload as input while running inference
      preprocess:
        default: image_classification <- default torchvision transforms for preprocessing
  - name: custom-model-name
    download: http://download-link <- module download link
    gpu: false
    stages:
      input: file
      # custom pytorch transforms for preprocessing the input
      preprocess:
        resize: 299
        center_crop: 299
        to_tensor: true
        normalize:
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
        unsqueeze: 0

For example following Python code is transformed into YAML as follows:

preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

to

preprocess:
    resize: 299
    center_crop: 299
    to_tensor: true
    normalize:
        mean: [0.485, 0.456, 0.406]
        std: [0.229, 0.224, 0.225]
    unsqueeze: 0


Using TorchLego for serving models

docker pull prabhuomkar/torchlego:<version>
docker run --rm --net=host -v ${PWD}/examples:/model-config prabhuomkar/torchlego:<version>
curl -X POST 'http://localhost:8080/v1/models/model-name' --form 'input=@"//file/examples/input.jpg"'

Note: the configuration for models and system go in the root folder of the container named model-config. That is the reason we are mounting a volume with folder containing the YAML configuration and .env configuration.

curl -X GET 'http://localhost:8080/v1/models'


Possible Improvements / Roadmap

If you liked the idea of TorchLego, go check it out on GitHub. If you have an idea or a suggestion for improvement, feel free to contribute via Issues/Pull Requests!