Deploy to a Docker container
In many machine learning applications, it can be useful to deploy a model as a standalone Docker container that can return predictions. This tutorial shows how to build a Docker container with an mlpack model serving predictions. Here the model returns predictions in a very primitive way: directly as input from a terminal, but it would be straightforward to adapt the container to provide a full REST API or similar.
mlpack applications inside of Docker containers can be built in a way that the resulting container is extremely small—sometimes even less than 1 MB!
See also:
- Installing mlpack
- Compile an mlpack program
- Deploying mlpack on Windows
- Setting up an mlpack cross-compilation environment
🔗 General workflow
To make a Docker container that serves predictions, we must first train a model. Therefore, our workflow to build this container will be:
- Decide on the problem to solve
- Write a program to train the model
- Write a program to make predictions with the trained model
- Build the Docker container with the prediction program
- Run the Docker container
🔗 Problem statement
For this simple example, we will solve a problem from the cybersecurity world: DGA detection. The example here is based on (and heavily uses the code from) mlpack’s DGA detection LSTM example.
Malware authors often write malware that communicates with a centralized command-and-control server. This can allow the malware to update itself, or to receive commands from an operator (e.g., ‘start Bitcoin mining’, or ‘lock system and display ransomware message’).
The malware author cannot hardcode a domain name into their malware, because
this would be easily blocked by any antivirus software. So, instead, a domain
generation algorithm is used to generate a series of domain names. The malware
will try to contact a server at each of these domain names. The individual DGA
domain names can look very random; for instance, some DGA domains generated by
the matsnu
malware family are:
brothernerveplacebringconsult.com
screencatchdishtellproposed.com
balladoptwelladdinfluence.com
capitalhuntdealsmokeboxclue.com
Other malware families may generate very random looking names, like
i828ywu0ywqs.net
or similar. Since the set of domain names that can be
generated by a DGA is huge, blocking individual domain names is not a realistic
mitigation strategy. However, we can use machine learning techniques to detect
DGA-generated domain names with a high level of accuracy!
This strategy has been shown to be effective in some previous work.
Adapting that approach for simplicity, we will train simple recurrent neural networks with LSTMs to detect benign domains and DGA domains, and then the Docker container application will read domain names as input and output a score indicating the likelihood that the domain name was generated by a DGA.
🔗 Model training program
Training a model can be done as a separate standalone program; since our goal is just to provide a container that produces predictions, the program in the container does not need to support training.
The C++ code for training a DGA detector is available as the standalone program
lstm_dga_detection_train.cpp
We can compile it with a call to g++
, following the instructions from
the compilation guide:
g++ -O3 -o lstm_dga_detection_train lstm_dga_detection_train.cpp -fopenmp -larmadillo
Some modification of the command above may be necessary if mlpack is installed on your system in a nonstandard location, or if you are not using the Armadillo wrapper. See Configuring mlpack with compile-time definitions and Linking without the Armadillo wrapper for more details.
Once the program is compiled, we can train on a dataset of domain names. The commands below will download the prepared dataset from the mlpack website, and then run the training process.
wget https://datasets.mlpack.org/dga_domains.csv.gz
gunzip dga_domains.csv.gz
./lstm_dga_detection_train dga_domains.csv
The training process may take a while, but when it is finished, performance
statistics about the models will be printed, and the model will be saved to
lstm_dga_detector.bin
. The models should achieve 99%+ accuracy on the
held-out test data.
🔗 Prediction program
The examples repository also provides the standalone prediction program
lstm_dga_detection_predict.cpp
.
We can also compile this with a call to g++
, following the instructions from
the compilation guide:
g++ -O3 -o lstm_dga_detection_predict lstm_dga_detection_predict.cpp -fopenmp -larmadillo -static
As with the training program above, some modification of the compilation command may be necessary depending on your configuration.
We also specified the -static
option here, so that the produced program is
statically linked. This will help us deploy into a Docker container, since we
can just run the program directly and do not need to ensure that supporting
libraries are available. Note that you will need a statically-compiled version
of Armadillo available to link against, or instead define the compiler option
-DARMA_DONT_USE_WRAPPER
and link with static OpenBLAS using -lopenblas
.
The prediction program reads from stdin, and once a domain is entered, a
prediction is computed and the word malicious
or benign
is emitted, along
with a score indicating the model’s “confidence”. A sample transcript of the
program is below:
$ ./lstm_dga_detection_predict lstm_dga_detector_benign.bin lstm_dga_detector_malicious.bin
www.mlpack.org
benign (score 44.5417)
mdfvkejbqoxg.ru
malicious (score 22.2786)
arma.sourceforge.net
benign (score 19.4883)
11b5n854ublnv152.net
malicious (score 7.89951)
🔗 Building the container
Building a Docker container that runs lstm_dga_detection_predict
is very
simple; we only need to put the prediction program and model in the container.
In fact, to save additional time, we can use a
distroless container.
This code can be used as the Dockerfile
:
FROM gcr.io/distroless/static-debian12
ADD lstm_dga_detection_predict .
ADD lstm_dga_detector_benign.bin .
ADD lstm_dga_detector_malicious.bin .
ENTRYPOINT ["./lstm_dga_detection_predict", \
"lstm_dga_detector_benign.bin", \
"lstm_dga_detector_malicious.bin"]
Building the container is simple (and nearly instantaneous):
docker build -t lstm_dga_detector .
And once it is built, it is easy to see that the container is relatively small:
$ docker images | grep -B 1 lstm_dga_detector
REPOSITORY TAG IMAGE ID CREATED SIZE
lstm_dga_detector latest 9fc931a2cdae 35 seconds ago 33.7MB
However, we haven’t even tried to optimize for size—see the Reducing the size of the container further section
🔗 Run the container
The container can now be deployed or run in any standard Docker environment (including on Kubernetes, although that seems like overkill for such a simple example).
Running the container locally (and interacting with the prediction service) is a simple command:
docker run --rm -it lstm_dga_detector
Once the container has started, simply type a domain name, hit enter, and a prediction (plus score) will be printed.
To build the example into a more complex application, you can use
lstm_dga_detection_predict.cpp
as a starting point.
🔗 Reducing the size of the container further
Although 33.7 MB for a container is already orders of magnitude smaller than an
equivalent unoptimized Python container, it is readily possible via compilation
options and a few other tricks to get the size to be significantly smaller. The
vast majority of the size of the container is just the size of the compiled
lstm_dga_detection_predict
:
$ ls -lh
-rwxrwxr-x 1 ryan ryan 31M Mar 6 20:46 lstm_dga_detection_predict
-rw-rw-r-- 1 ryan ryan 80K Mar 5 19:10 lstm_dga_detector_benign.bin
-rw-rw-r-- 1 ryan ryan 80K Mar 5 19:10 lstm_dga_detector_malicious.bin
31 MB for a compiled program is quite large! But, as it turns out, most of this is due to dependencies. We can see this by running a command that does not perform linking, like this:
g++ -O3 -c -o lstm_dga_detection_predict.o lstm_dga_detection_predict.cpp
This compiled (but not linked) object file is much smaller:
$ ls -lh lstm_dga_detection_predict.o
-rw-rw-r-- 1 ryan ryan 1.6M Mar 6 20:48 lstm_dga_detection_predict.o
That implies that our primary size issue is not with our own code, but instead with our dependencies. Specifically, on most systems, OpenBLAS is compiled to support any architecture and this results in the statically-linked OpenBLAS library being quite large. On a Debian system:
$ cd /usr/lib/x86_64-linux-gnu/openblas-pthread/
$ ls -lh libopenblasp-r0.3.28.a
-rw-r--r-- 1 root root 61M Nov 20 05:52 libopenblasp-r0.3.28.a
61 MB is very large! Although not all of OpenBLAS is used by our DGA detection program, a significant portion is, and this is the primary culprit for our large program size. Two alternatives to reduce this size are:
Use reference BLAS and LAPACK instead
The reference BLAS and LAPACK implementations are often slower, but much smaller in size.
-
On a Debian or Ubuntu system, install
libblas-dev
andliblapack-dev
instead oflibopenblas-dev
. -
This step alone reduces the size of the statically-linked
lstm_dga_detection_predict
to 3.4 MB, but the use of reference BLAS and LAPACK is likely to be slower than OpenBLAS.
Compile OpenBLAS manually
Another option is to compile OpenBLAS manually
from source to reduce its size.
OpenBLAS is compiled with a make
command; options for this can be found in the
OpenBLAS manual.
As an example (note: do not use this command directly on your system, see the description of the options and decide which is right for your system), the following command produces an OpenBLAS library that is only 16MB.
make \
TARGET=NEHALEM \
DYNAMIC_ARCH=0 \
COMMON_OPTS="-Os -ffunction-sections -fdata-sections" \
NO_SHARED=1 \
BUILD_DOUBLE=0 \
BUILD_COMPLEX=0 \
BUILD_COMPLEX16=0 \
BUILD_BFLOAT16=0
Looking at each option:
TARGET=NEHALEM
andDYNAMIC_ARCH=0
specifies that this version of OpenBLAS can only be run on Nehalem or newer Intel processors.- This causes the code to be significantly less portable.
- However, the
DYNAMIC_ARCH=0
option significantly reduces the size of OpenBLAS and therefore downstream code too by compiling only for the processor of interest. - For other
TARGET
options, seeTargetList.txt
in the OpenBLAS source code.
-
COMMON_OPTS="-Os -ffunction-sections -fdata-sections"
specifies that the compiler should aim to keep the compiled code as small as possible, and include section information for later stripping and further size minimization of code. -
NO_SHARED=1
specifies that only the static version of OpenBLAS (libopenblas.a
) should be built. BUILD_DOUBLE=0 BUILD_COMPLEX=0 BUILD_COMPLEX16=0 BUILD_BFLOAT16=0
disables all OpenBLAS functions for data types we are not using in our program.- The
lstm_dga_detection_predict.cpp
code only usesarma::fmat
(e.g. matrices withfloat
), so we can omit the other functions.
- The
With this stripped-down version of OpenBLAS, the size of
lstm_dga_detection_predict
with no further compilation modifications is
reduced to 3.5 MB.
Note: when compiling lstm_dga_detection_predict
, linking against the
hand-compiled OpenBLAS will require specifying the -L/path/to/openblas/
option
so that the linker finds the manually-compiled OpenBLAS version.
🔗 Compilation options for further size reduction
We have gone from 31 MB to roughly 3 MB just by replacing our OpenBLAS
implementation with either reference BLAS/LAPACK or a hand-compiled version.
However, we can specify some additional compiler options to reduce the size
further. Assuming that we have placed the manually-compiled libopenblas.a
in
the same directory as lstm_dga_detection_predict.cpp
, we can compile with the
following command to reduce size even further:
g++ -o lstm_dga_detection_predict lstm_dga_detection_predict.cpp \
-Os \
-DNDEBUG \
-DARMA_DONT_USE_WRAPPER \
-ffunction-sections -fdata-sections \
-static \
-L. -lopenblas \
-Wl,--gc-sections \
-Wl,--strip-all
Looking at each option:
-Os
tells the compiler to optimize each function to have a small size.-DNDEBUG
removes any debugging symbols or code paths.-DARMA_DONT_USE_WRAPPER
is an Armadillo directive that tells Armadillo not to link against the Armadillo runtime library but instead directly against OpenBLAS.- This is why we use
-lopenblas
later instead of-larmadillo
.
- This is why we use
-ffunction-sections -fdata-sections
are used so that the linker can later remove unused sections from the code.-Wl,--gc-sections -Wl,--strip-all
tells the linker to strip all unused code from the final program.- Note that we have omitted
-fopenmp
; this will cause compiled code to be somewhat smaller, but it will execute serially. Given that our prediction program only predicts a single domain at a time, this is acceptable.
This shaves off more than 1 MB:
$ ls -lh lstm_dga_detection_predict
-rwxrwxr-x 1 ryan ryan 1.9M Mar 6 21:37 lstm_dga_detection_predict
That is a size reduction of basically 16x entirely obtained through
compiler options. When the Docker container is recompiled with the new version
of lstm_dga_detection_predict
, the size is significantly improved:
$ docker images | grep -B 1 lstm
REPOSITORY TAG IMAGE ID CREATED SIZE
lstm_dga_detector latest 0be5a677fd9c 6 seconds ago 4.09MB
But there is still room for additional size reduction, either through further compiler options or more intrusive code modifications. Likely the largest contributor to code size after the optimizations above are serialization and the standard libraries:
-
Avoiding the use of
data::Load()
(and thus the Cereal serialization library) by directly saving the weights of the RNN using Armadillo’s built-in save functionality would be effective. -
Replacing the standard
libc
andlibstdc++
implementation with lightweight replacements, likemusl
and others.
These further optimizations, however, are beyond the scope of this tutorial.