‘Piping’ Hot Docker Containers

One of the possibly lesser used flags for docker run is -a which allows you to attach the container’s STDIN, STDOUT or STDERR and pipe it back to the shell which invoked the container. This allows you to construct pipelines of commands, just as you can with UNIX processes. For instance, using UNIX commands to count the number of files in a directory, you would do something like:

ls /dev | wc -l

1 2	ls /dev \| wc -l

Since the Docker container acts as a command, it has its own STDIN, STDOUT, and STDERR. You can string together multiple commands and containers.

After I ‘docker’ized the ‘grep’ discussed in Naive Substructure Substance Matching on the Raspberry Pi, I was able to attach the STDOUT from the grep to wc -l to get a count of the matching substances.

docker run -a stdout nimblestratus/rpi-substructure-grepper $PATTERN | wc -l

1 2	docker run -a stdout nimblestratus/rpi-substructure-grepper $PATTERN \| wc -l

This works just fine. In fact, it opens up opportunities for all sort of other commands/suites running inside a container. Pandoc running in a container to generate PDF’s comes to mind. Or ImageMagick. Or any of a number of other commands. All of the advantages of docker containers with all of the fun of UNIX pipes.

Then the imp of the perverse struck. If I could redirect the STDOUT of a container running on a local host, would it work as well on another? In short…. yes.

You can attach to the streams of a docker container running on a different host. The docker daemon needs to be bound to a port on the other host(s).

So, if I can run one at a time, why not five? I knocked out a couple of one line shell scripts (harness and runner) and, for grins and giggles, added a ‘-x’ magick cookie to demonstrate what’s happening. The lines below with the ‘+’ inside show the commands which are being performed behind the scenes:

matt@argentum:~/projects/apis/chem-swarm$ time ./harness 'CC1CCCCC=O' |tee /tmp/mout3 | wc -l
+ xargs -P 5 -n 1 ./runner CC1CCCCC=O
+ seq 1 5
+ docker -H tcp://192.168.1.101:2375 run -a STDOUT -v /opt/smiles:/data -e PATTERN=CC1CCCCC=O nimblestratus/rpi-substructure-grepper
+ docker -H tcp://192.168.1.103:2375 run -a STDOUT -v /opt/smiles:/data -e PATTERN=CC1CCCCC=O nimblestratus/rpi-substructure-grepper
+ docker -H tcp://192.168.1.104:2375 run -a STDOUT -v /opt/smiles:/data -e PATTERN=CC1CCCCC=O nimblestratus/rpi-substructure-grepper
+ docker -H tcp://192.168.1.102:2375 run -a STDOUT -v /opt/smiles:/data -e PATTERN=CC1CCCCC=O nimblestratus/rpi-substructure-grepper
+ docker -H tcp://192.168.1.105:2375 run -a STDOUT -v /opt/smiles:/data -e PATTERN=CC1CCCCC=O nimblestratus/rpi-substructure-grepper
22

real    0m5.776s
user    0m0.099s
sys 0m0.091s

matt@argentum:~/projects/apis/chem-swarm$ time ./harness 'CC1CCCCC=O' |tee /tmp/mout3 | wc -l

+ xargs -P 5 -n 1 ./runner CC1CCCCC=O

+ seq 1 5

+ docker -H tcp://192.168.1.101:2375 run -a STDOUT -v /opt/smiles:/data -e PATTERN=CC1CCCCC=O nimblestratus/rpi-substructure-grepper

+ docker -H tcp://192.168.1.103:2375 run -a STDOUT -v /opt/smiles:/data -e PATTERN=CC1CCCCC=O nimblestratus/rpi-substructure-grepper

+ docker -H tcp://192.168.1.104:2375 run -a STDOUT -v /opt/smiles:/data -e PATTERN=CC1CCCCC=O nimblestratus/rpi-substructure-grepper

+ docker -H tcp://192.168.1.102:2375 run -a STDOUT -v /opt/smiles:/data -e PATTERN=CC1CCCCC=O nimblestratus/rpi-substructure-grepper

+ docker -H tcp://192.168.1.105:2375 run -a STDOUT -v /opt/smiles:/data -e PATTERN=CC1CCCCC=O nimblestratus/rpi-substructure-grepper

real 0m5.776s

user 0m0.099s

sys 0m0.091s

In less than six seconds, it’s spawned docker containers on five other hosts. Each of these containers is performing a substructure (read grep) search of ~13.7 million chemical compounds for a total of ~69M compounds. The results are then sent back to the initiating host, which is dumping the results to a file as well as counting the results. Not too shabby. And it scales to O(n), too — IO is the main limiting factor here.

I can think of lots of uses for this. Poor man’s parallel processing. Map/Reduce. Many more.

The disadvantage of this quick and dirty method is that you need to know the IP addresses on which to run the commands. Swarm alleviates the necessity of knowing the addresses or of coming up with a methodology for distributing the workload, which is always a plus.

It’s not necessarily something I’d do to go to production, but for testing or experimentation, it works quite well. It also leads to other experiments.

Docker is really awesome; I’m learning new things to do with it all the time.

Ramblings

Musings of Matt Williams

‘Piping’ Hot Docker Containers

Like this:

Related

2 pings

Stronger Faster Algorithms » Ramblings

?? Docker ???? - ???????

Leave a Reply Cancel reply

Subscribe to Blog via Email

Recent Posts

Top Posts & Pages

Archives

Categories

Copyright

Ramblings

Musings of Matt Williams

‘Piping’ Hot Docker Containers

Share this:

Like this:

Related

2 pings

Stronger Faster Algorithms » Ramblings

?? Docker ???? - ???????

Leave a Reply Cancel reply

Subscribe to Blog via Email

Recent Posts

Top Posts & Pages

Archives

Categories

Copyright