Apr 19

Docker Containers: Smaller is not always better

Categories:

cloud, cloud computing, Docker, gotchas, Raspberry Pi

Generally smaller Docker containers are preferred to larger ones. However, a smaller container is not always as performant as a larger one. By using a (slightly) larger container, performance improved over 30x.

TL;DR

The grep included in busybox is painfully slow. When doing using grep to process lots of data, add a (real) grep to the container.

Background

As discussed in Naive Substructure Substance Matching on the Raspberry Pi » Ramblings, I am exploring the limits of the Raspberry Pi for processing data. I chose SubStructure searching as a problem set as it is a non-trivial problem and a decent demonstration for co-workers of the processing power of the Pi.

I’ve pre-processed the NIH Pubchem Compounds database to extract SMILES data — this is a language for describing the structure of chemical compounds. As a relatively naive first implementation I’m using grep to match substructures. I have split the files amongst five Pi 2s; each is processing ~840M in ~730 files. xargs is used to do concurrent processing across multiple cores. After a few cycles, the entire data is read into cache and the Pi is able to process it in 1-2 seconds for realistic searches. A ridiculous search, finding all of the carbon containing compounds (over 13 million) takes 8-10 seconds.

Having developed a solution, I then set about dockerizing it.

I chose voxxit/alpine-rpi for my base — it’s quite small, about 5mb and has almost everything needed. I discovered that the version of xargs which ships with the container does not support -P. So xargs is added via:

apk --update add findutils

1 2	apk --update add findutils

I ran my test and found that the performance was horrid.

I decided to drop into an interactive shell so that I could tweak. You can see the performance below in the ‘Before’.

Before:

/opt/smiles # date;time /bin/ash -c " ls | xargs -P 4 -n 50 grep -h 'C1CCCCC1C=O'| wc -l ";date
Sun Apr 19 14:25:54 GMT 2015
19
real    1m 4.21s
user    3m 57.52s
sys 0m 3.52s
Sun Apr 19 14:26:58 GMT 2015

/opt/smiles # date;time /bin/ash -c " ls | xargs -P 4 -n 50 grep -h 'C1CCCCC1C=O'| wc -l ";date

Sun Apr 19 14:25:54 GMT 2015

real 1m 4.21s

user 3m 57.52s

sys 0m 3.52s

Sun Apr 19 14:26:58 GMT 2015

Typically the performance of a large IO operation will improve after a few cycles; the system is able to cache disk reads. It generally takes 3 cycles before all of the data is in the cache. However, the numbers above did not improve. I did verify that multiple cores were, indeed, being used.

I proceeded down a rabbit hole, looking at IO and VM statistics. Horrible. From there I googled to see if, indeed, Docker uses the disk cache (it does) and/or if there was a flag I needed to set (I didn’t). Admittedly, I couldn’t believe that IO using Docker could be that much slower, but I am a firm believer in testing my assumptions.

After poking about in /proc and /sys and running the search outside of Docker, I decided to see if there might be a faster grep. As it turns out, the container uses busybox:

/opt/smiles # ls -li /bin/grep
 501101 lrwxrwxrwx    1 root     root            12 Mar  6 13:27 /bin/grep -> /bin/busybox

/opt/smiles # ls -li /bin/grep

501101 lrwxrwxrwx 1 root root 12 Mar 6 13:27 /bin/grep -> /bin/busybox

This is generally a good choice in terms of size. However, it appears that the embedded grep is considerably slower than molasses in January. On a whim I decided to install grep:

/opt/smiles # apk search grep
ngrep-1.45-r1
grep-doc-2.20-r1
grep-2.20-r1
/opt/smiles # apk --update add grep
fetch http://repos.lax-noc.com/alpine/v3.1/main/armhf/APKINDEX.tar.gz
(1/2) Installing pcre (8.36-r1)
(2/2) Installing grep (2.20-r1)
Executing busybox-1.22.1-r14.trigger
OK: 6 MiB in 18 packages
/opt/smiles # which grep
/usr/bin/grep
/opt/smiles # ls -li /usr/bin/grep
  66417 -rwxr-xr-x    1 root     root        189840 Feb  2 11:05 /usr/bin/grep

/opt/smiles # apk search grep

ngrep-1.45-r1

grep-doc-2.20-r1

grep-2.20-r1

/opt/smiles # apk --update add grep

fetch http://repos.lax-noc.com/alpine/v3.1/main/armhf/APKINDEX.tar.gz

(1/2) Installing pcre (8.36-r1)

(2/2) Installing grep (2.20-r1)

Executing busybox-1.22.1-r14.trigger

OK: 6 MiB in 18 packages

/opt/smiles # which grep

/usr/bin/grep

/opt/smiles # ls -li /usr/bin/grep

66417 -rwxr-xr-x 1 root root 189840 Feb 2 11:05 /usr/bin/grep

I then re-ran the test and did a Snoopy Dance.

After:

/opt/smiles # date;time /bin/ash -c " ls | xargs -P 4 -n 50 grep -h 'C1CCCCC1C=O'| wc -l ";date
Sun Apr 19 14:30:35 GMT 2015
19
real    0m 1.81s
user    0m 4.39s
sys 0m 2.38s
Sun Apr 19 14:30:36 GMT 2015

/opt/smiles # date;time /bin/ash -c " ls | xargs -P 4 -n 50 grep -h 'C1CCCCC1C=O'| wc -l ";date

Sun Apr 19 14:30:35 GMT 2015

real 0m 1.81s

user 0m 4.39s

sys 0m 2.38s

Sun Apr 19 14:30:36 GMT 2015

Lessons Learned

This episode drove home the need to question assumptions. In this case the assumption is that a smaller sized container is inherently better. I believe that smaller and lighter containers are a Good Practice and an admirable goal. However, as seen here, smaller is not always better.

I also habitually look at a container’s Dockerfile before pulling it. In this case it wasn’t enough. It reinforced the lesson that I need to know what’s running in a container before I try to use it.

This post has no tag

4 comments

2 pings

Skip to comment form ↓

Andreas Heissenberger

April 21, 2015 at 6:28 am (UTC -4) Link to this comment

It is more important to use the same container for all your projects instead of using different small containers. Docker shares the resources of images and this way you save disk space. If you need a data container – use the same image you used for your application
Matt Williams

April 21, 2015 at 8:35 am (UTC -4) Link to this comment

Thank you for your comment.

It depends, in my opinion. The case that you use, namely a data container, should start from scratch — the empty filesystem. That way saves the most space ;-). Likewise for go executables living as a single file in a container.

That said, I think you can make a case for less layers in a container — there’s a cost in maintaining the layers.

However, I’m also a pragmatist. If there’s a utility tool, such as a database or a monitor, is the relatively small amount of space saved in the footprint of the container with the price of rebuilding a the tool from a base container? I’d argue not.

Also, if I am to put everything in every container it goes against the grain of the unix philosophy — having lots of tools which do one thing well.

Starting from the same base, such as voxxit/alpine-rpi I could see for new development. but again, in many instances I don’t think there is a sufficient return on the investment of my time and energy.
jonnalley

April 26, 2015 at 12:41 pm (UTC -4) Link to this comment

I think the title of your post is a bit misleading. Your problem is with busybox grep, and has nothing to do with container size. You would have suffered the same performance issue if you were using busybox grep on bare metal. BTW, if you are interested in a performant grep alternative check out the silver searcher.

http://geoff.greer.fm/ag/
Matt Williams

April 27, 2015 at 3:06 am (UTC -4) Link to this comment

Well…. I’d gotten busybox’s grep by trying to make a smaller container… so at least in the headspace I was in at the time it made sense to me. I can see where you might find it misleading.

I’ll definitely check out silver searcher, though. Thanks for the tip!

?? Docker ???? - ???????

April 15, 2016 at 10:53 am (UTC -4) Link to this comment

[…] Docker Containers: Smaller is not always better […]
Docker??????????-???

April 29, 2016 at 1:10 am (UTC -4) Link to this comment

[…] ?????Docker Containers: Smaller is not always better??????? ??????? […]

Ramblings

Musings of Matt Williams

Docker Containers: Smaller is not always better

TL;DR

Background

Lessons Learned

Like this:

Related

4 comments

2 pings

Andreas Heissenberger

Matt Williams

jonnalley

Matt Williams

?? Docker ???? - ???????

Docker??????????-???

Leave a Reply Cancel reply

Subscribe to Blog via Email

Recent Posts

Top Posts & Pages

Archives

Categories

Copyright

Ramblings

Musings of Matt Williams

Docker Containers: Smaller is not always better

TL;DR

Background

Lessons Learned

Share this:

Like this:

Related

4 comments

2 pings

Andreas Heissenberger

Matt Williams

jonnalley

Matt Williams

?? Docker ???? - ???????

Docker??????????-???

Leave a Reply Cancel reply

Subscribe to Blog via Email

Recent Posts

Top Posts & Pages

Archives

Categories

Copyright