mlpack

Deploy to a Docker container

In many machine learning applications, it can be useful to deploy a model as a standalone Docker container that can return predictions. This tutorial shows how to build a Docker container with an mlpack model serving predictions. Here the model returns predictions in a very primitive way: directly as input from a terminal, but it would be straightforward to adapt the container to provide a full REST API or similar.

mlpack applications inside of Docker containers can be built in a way that the resulting container is extremely small—sometimes even less than 1 MB!

See also:

🔗 General workflow

To make a Docker container that serves predictions, we must first train a model. Therefore, our workflow to build this container will be:

🔗 Problem statement

For this simple example, we will solve a problem from the cybersecurity world: DGA detection. The example here is based on (and heavily uses the code from) mlpack’s DGA detection LSTM example.

Malware authors often write malware that communicates with a centralized command-and-control server. This can allow the malware to update itself, or to receive commands from an operator (e.g., ‘start Bitcoin mining’, or ‘lock system and display ransomware message’).

The malware author cannot hardcode a domain name into their malware, because this would be easily blocked by any antivirus software. So, instead, a domain generation algorithm is used to generate a series of domain names. The malware will try to contact a server at each of these domain names. The individual DGA domain names can look very random; for instance, some DGA domains generated by the matsnu malware family are:

Other malware families may generate very random looking names, like i828ywu0ywqs.net or similar. Since the set of domain names that can be generated by a DGA is huge, blocking individual domain names is not a realistic mitigation strategy. However, we can use machine learning techniques to detect DGA-generated domain names with a high level of accuracy!

This strategy has been shown to be effective in some previous work.

Adapting that approach for simplicity, we will train simple recurrent neural networks with LSTMs to detect benign domains and DGA domains, and then the Docker container application will read domain names as input and output a score indicating the likelihood that the domain name was generated by a DGA.

🔗 Model training program

Training a model can be done as a separate standalone program; since our goal is just to provide a container that produces predictions, the program in the container does not need to support training.

The C++ code for training a DGA detector is available as the standalone program lstm_dga_detection_train.cpp We can compile it with a call to g++, following the instructions from the compilation guide:

g++ -O3 -o lstm_dga_detection_train lstm_dga_detection_train.cpp -fopenmp -larmadillo

Some modification of the command above may be necessary if mlpack is installed on your system in a nonstandard location, or if you are not using the Armadillo wrapper. See Configuring mlpack with compile-time definitions and Linking without the Armadillo wrapper for more details.

Once the program is compiled, we can train on a dataset of domain names. The commands below will download the prepared dataset from the mlpack website, and then run the training process.

wget https://datasets.mlpack.org/dga_domains.csv.gz
gunzip dga_domains.csv.gz
./lstm_dga_detection_train dga_domains.csv

The training process may take a while, but when it is finished, performance statistics about the models will be printed, and the model will be saved to lstm_dga_detector.bin. The models should achieve 99%+ accuracy on the held-out test data.

🔗 Prediction program

The examples repository also provides the standalone prediction program lstm_dga_detection_predict.cpp. We can also compile this with a call to g++, following the instructions from the compilation guide:

g++ -O3 -o lstm_dga_detection_predict lstm_dga_detection_predict.cpp -fopenmp -larmadillo -static

As with the training program above, some modification of the compilation command may be necessary depending on your configuration.

We also specified the -static option here, so that the produced program is statically linked. This will help us deploy into a Docker container, since we can just run the program directly and do not need to ensure that supporting libraries are available. Note that you will need a statically-compiled version of Armadillo available to link against, or instead define the compiler option -DARMA_DONT_USE_WRAPPER and link with static OpenBLAS using -lopenblas.

The prediction program reads from stdin, and once a domain is entered, a prediction is computed and the word malicious or benign is emitted, along with a score indicating the model’s “confidence”. A sample transcript of the program is below:

$ ./lstm_dga_detection_predict lstm_dga_detector_benign.bin lstm_dga_detector_malicious.bin
www.mlpack.org
benign (score 44.5417)
mdfvkejbqoxg.ru
malicious (score 22.2786)
arma.sourceforge.net
benign (score 19.4883)
11b5n854ublnv152.net
malicious (score 7.89951)

🔗 Building the container

Building a Docker container that runs lstm_dga_detection_predict is very simple; we only need to put the prediction program and model in the container. In fact, to save additional time, we can use a distroless container. This code can be used as the Dockerfile:

FROM gcr.io/distroless/static-debian12

ADD lstm_dga_detection_predict .
ADD lstm_dga_detector_benign.bin .
ADD lstm_dga_detector_malicious.bin .

ENTRYPOINT ["./lstm_dga_detection_predict", \
            "lstm_dga_detector_benign.bin", \
            "lstm_dga_detector_malicious.bin"]

Building the container is simple (and nearly instantaneous):

docker build -t lstm_dga_detector .

And once it is built, it is easy to see that the container is relatively small:

$ docker images | grep -B 1 lstm_dga_detector
REPOSITORY                          TAG       IMAGE ID       CREATED          SIZE
lstm_dga_detector                   latest    9fc931a2cdae   35 seconds ago   33.7MB

However, we haven’t even tried to optimize for size—see the Reducing the size of the container further section

🔗 Run the container

The container can now be deployed or run in any standard Docker environment (including on Kubernetes, although that seems like overkill for such a simple example).

Running the container locally (and interacting with the prediction service) is a simple command:

docker run --rm -it lstm_dga_detector

Once the container has started, simply type a domain name, hit enter, and a prediction (plus score) will be printed.

To build the example into a more complex application, you can use lstm_dga_detection_predict.cpp as a starting point.

🔗 Reducing the size of the container further

Although 33.7 MB for a container is already orders of magnitude smaller than an equivalent unoptimized Python container, it is readily possible via compilation options and a few other tricks to get the size to be significantly smaller. The vast majority of the size of the container is just the size of the compiled lstm_dga_detection_predict:

$ ls -lh
-rwxrwxr-x 1 ryan ryan  31M Mar  6 20:46 lstm_dga_detection_predict
-rw-rw-r-- 1 ryan ryan  80K Mar  5 19:10 lstm_dga_detector_benign.bin
-rw-rw-r-- 1 ryan ryan  80K Mar  5 19:10 lstm_dga_detector_malicious.bin

31 MB for a compiled program is quite large! But, as it turns out, most of this is due to dependencies. We can see this by running a command that does not perform linking, like this:

g++ -O3 -c -o lstm_dga_detection_predict.o lstm_dga_detection_predict.cpp

This compiled (but not linked) object file is much smaller:

$ ls -lh lstm_dga_detection_predict.o
-rw-rw-r-- 1 ryan ryan 1.6M Mar  6 20:48 lstm_dga_detection_predict.o

That implies that our primary size issue is not with our own code, but instead with our dependencies. Specifically, on most systems, OpenBLAS is compiled to support any architecture and this results in the statically-linked OpenBLAS library being quite large. On a Debian system:

$ cd /usr/lib/x86_64-linux-gnu/openblas-pthread/
$ ls -lh libopenblasp-r0.3.28.a
-rw-r--r-- 1 root root 61M Nov 20 05:52 libopenblasp-r0.3.28.a

61 MB is very large! Although not all of OpenBLAS is used by our DGA detection program, a significant portion is, and this is the primary culprit for our large program size. Two alternatives to reduce this size are:

Use reference BLAS and LAPACK instead

The reference BLAS and LAPACK implementations are often slower, but much smaller in size.

Compile OpenBLAS manually

Another option is to compile OpenBLAS manually from source to reduce its size. OpenBLAS is compiled with a make command; options for this can be found in the OpenBLAS manual.

As an example (note: do not use this command directly on your system, see the description of the options and decide which is right for your system), the following command produces an OpenBLAS library that is only 16MB.

make \
    TARGET=NEHALEM \
    DYNAMIC_ARCH=0 \
    COMMON_OPTS="-Os -ffunction-sections -fdata-sections" \
    NO_SHARED=1 \
    BUILD_DOUBLE=0 \
    BUILD_COMPLEX=0 \
    BUILD_COMPLEX16=0 \
    BUILD_BFLOAT16=0

Looking at each option:

With this stripped-down version of OpenBLAS, the size of lstm_dga_detection_predict with no further compilation modifications is reduced to 3.5 MB.

Note: when compiling lstm_dga_detection_predict, linking against the hand-compiled OpenBLAS will require specifying the -L/path/to/openblas/ option so that the linker finds the manually-compiled OpenBLAS version.

🔗 Compilation options for further size reduction

We have gone from 31 MB to roughly 3 MB just by replacing our OpenBLAS implementation with either reference BLAS/LAPACK or a hand-compiled version. However, we can specify some additional compiler options to reduce the size further. Assuming that we have placed the manually-compiled libopenblas.a in the same directory as lstm_dga_detection_predict.cpp, we can compile with the following command to reduce size even further:

g++ -o lstm_dga_detection_predict lstm_dga_detection_predict.cpp \
    -Os \
    -DNDEBUG \
    -DARMA_DONT_USE_WRAPPER \
    -ffunction-sections -fdata-sections \
    -static \
    -L. -lopenblas \
    -Wl,--gc-sections \
    -Wl,--strip-all

Looking at each option:

This shaves off more than 1 MB:

$ ls -lh lstm_dga_detection_predict
-rwxrwxr-x 1 ryan ryan 1.9M Mar  6 21:37 lstm_dga_detection_predict

That is a size reduction of basically 16x entirely obtained through compiler options. When the Docker container is recompiled with the new version of lstm_dga_detection_predict, the size is significantly improved:

$ docker images | grep -B 1 lstm
REPOSITORY                          TAG       IMAGE ID       CREATED          SIZE
lstm_dga_detector                   latest    0be5a677fd9c   6 seconds ago   4.09MB

But there is still room for additional size reduction, either through further compiler options or more intrusive code modifications. Likely the largest contributor to code size after the optimizations above are serialization and the standard libraries:

These further optimizations, however, are beyond the scope of this tutorial.