From PyTorch to Golang: Using TorchScript and CGO for Model Inference

Mar 27, 2023

Note: This blog post is part of my ongoing work on experiments with model training, deployment and monitoring repository bitbeast.

Source Code: GitHub

Introduction

Efficient model inference pipelines are crucial for machine learning applications, as they enable real-time predictions, cost savings, scalability, and improved user experience. However, large-scale inference pipelines tend to steer away from Python to C++ due to its performance and memory usage limitations, as well as the need for specialized hardware.

PyTorch provides a C++ API called LibTorch that can be used to build inference pipelines in C++. To integrate C++ with modern architectures like Golang (or Rust, Elixir), developers can introduce language-specific bindings to C++. However, this can be a complex task that requires a deep understanding of both LibTorch and Golang. CGO is a tool that enables high-performance and seamless integration between Golang and existing C/C++ codebases, making it useful for calling C/C++ code from Golang, including LibTorch C++ models for efficient model inference.

The motivation for the blog post is to provide a simple guide for developers who want to use LibTorch C++ models in their Golang applications using CGO.


Setting up TorchScript Module and LibTorch

To create a TorchScript module, a PyTorch model is converted to a serialized script module that can be executed independently of the Python runtime. My previous blog post gives a detailed guide on how to create a TorchScript module, which involves defining the model’s forward function using PyTorch’s scripting language and compiling the script module using the torch.jit.script function. The resulting script module can be saved and loaded for efficient model inference in C++ or Python applications.

Using LibTorch for Model Inference

To create a simple C++ program to load a pre-trained PyTorch model and perform inference, you can follow these steps:

#include <torch/script.h> // One-stop header.
#include <iostream>
#include <memory>

int main() {
    torch::jit::script::Module module;
    try {
        // Load the serialized script module using torch::jit::load function
        module = torch::jit::load("model.pt");
    }
    catch (const c10::Error& e) {
        std::cerr << "error loading the model\n";
        return -1;
    }
    // Prepare the input data in the appropriate format required by the model
    auto input = torch::ones({1, 3, 224, 224});
    // Perform inference using module.forward function
    auto output = module.forward({input}).toTensor();
    std::cout << output << '\n';
}

You can refer to the PyTorch documentation for detailed instructions and examples on how to use the PyTorch C++ library: https://pytorch.org/cppdocs.

For this experiment, you can run make deps after cloning the source code, for installing LibTorch and Golang dependencies.


Writing CGO Wrappers

Why don't you explain this to me like I'm five?

CGO 101

Here’s a very simple example for understanding CGO:
Suppose you have a C++ library that contains a function add which takes two integers as input and returns their sum. Here’s how you can do it!

golibtorch: C Wrapper for LibTorch and CGO

Here’s the C wrapper for LibTorch C++ code which loads the module, runs inference, and returns the result. Notice, this code is supposed to change depending on your data type for the C++ output. In this, experiment the generated TorchScript module returns a dictionary {"label": score}. So the GetResult function contains code to return those values in a readable format to Golang.

#ifndef __GOLIBTORCH_H__
#define __GOLIBTORCH_H__
#ifdef __cplusplus
extern "C" {
#endif  // __cplusplus
  typedef void *mModel;
  typedef struct {
    char** labels;
    float* scores;
  } Result;
  mModel NewModel(char *modelFile);
  Result *GetResult(mModel model, float *inputData, int *channels, int *width, int *height);
  void DeleteModel(mModel model);
#ifdef __cplusplus
}
#endif  // __cplusplus
#endif  // __GOLIBTORCH_H__

You can find the C++ implementation of the Header file here.

In your Golang code, you can include the following comments for CGO:

// #cgo LDFLAGS: -lstdc++ -L/usr/lib/libtorch/lib -ltorch_cpu -lc10
// #cgo CXXFLAGS: -std=c++17 -I${SRCDIR} -g -O3
// #cgo CFLAGS: -D_GLIBCXX_USE_CXX11_ABI=1
// #include <stdio.h>
// #include <stdlib.h>
// #include "golibtorch.h"
import "C"

Commented lines 1 to 3 are for the C++ compiler to know where to find LibTorch and which C++ version to use for compiling.
Commented lines 4 to 6 includes the necessary headers for your C wrapper.
You can find the Go implementation here.

In your Golang code, you can access the C wrapper variables and functions using C.xxx:

type Model struct {
  model C.mModel
}

func NewModel(modelFile string) (*Model, error) {
  return &Model{
    model: C.NewModel(
      C.CString(modelFile),
    ),
  }, nil
}


Performing Inference

Preparing Input

In Python, it is very easy to use TorchVision to preprocess the images. In Golang, the image read from the disk needs to be converted into bytes so that it can be converted to float array Tensor. You can refer to the preProcess function in golibtorch.go which converts the image to a float32 array and returns the size of the image. It is a good opportunity to explore the scriptable transforms as well as create a TorchVision Golang wrapper.

Running Forward Pass

In CGO, data exchange between C and Golang occurs through pointers. Pointers allow both languages to access and manipulate the same block of memory. When calling a C function from Golang, the function parameters are passed by value, but pointers can be used to pass a memory address instead. Similarly, when returning data from C to Golang, the C function can write the result to a memory location pointed to by a Golang pointer. This leaves room for some memory management.

You can prepare the inputs which can be passed to the CGO functions:

inputPtr := (*C.float)(unsafe.Pointer(&data[0]))
channelPtr := (*C.int)(unsafe.Pointer(&vals[0]))
widthPtr := (*C.int)(unsafe.Pointer(&vals[1]))
heightPtr := (*C.int)(unsafe.Pointer(&vals[2]))

Here, we are converting the float32 slice in C++ to C float array pointer, num of channels and size of the image in C int pointers.

Now, we can run the forward pass by passing the required input pointers like:

cResult := C.GetResult(m.model, inputPtr, channelPtr, widthPtr, heightPtr)
defer C.free(unsafe.Pointer(cResult))
if cResult == nil {
    return nil, errors.New("error in getting result")
}

Note, its recommended to always use C.free to free the memory taken by the result pointers.

Similarly, now you can use the result pointer to parse the output in Go specific format:

// length of slice for labels and scores
const TopK = 5
// create go specific slice of strings for labels and slice of floats for scores
labels := make([]string, TopK)
scores := make([]float32, TopK)
// convert C pointer to slice of floats
scoresHeader := (*[1 << 30]float32)(unsafe.Pointer(cResult.scores))[:TopK:TopK]
copy(scores, scoresHeader)
// convert C pointer to slice of strings
ptr := uintptr(unsafe.Pointer(cResult.labels))
for i := 0; i < TopK; i++ {
    labelPtr := (**C.char)(unsafe.Pointer(ptr))
    labels[i] = C.GoString(*labelPtr)
    ptr += unsafe.Sizeof(uintptr(0))
}

This blog post has explained how to call a LibTorch C++ model from Golang using CGO. By following the steps outlined in this post, developers can create fast and efficient inference pipelines using Golang and LibTorch C++.

If you liked this experiment, go check it out on GitHub. If you have an idea or a suggestion for improvement, feel free to contribute via Issues/Pull Requests!