One of the possibly lesser used flags for docker run
is -a
which allows you to attach the container’s STDIN, STDOUT or STDERR and pipe it back to the shell which invoked the container. This allows you to construct pipelines of commands, just as you can with UNIX processes. For instance, using UNIX commands to count the number of files in a directory, you would do something like:
1 2 |
ls /dev | wc -l |
Since the Docker container acts as a command, it has its own STDIN, STDOUT, and STDERR. You can string together multiple commands and containers.
After I ‘docker’ized the ‘grep’ discussed in Naive Substructure Substance Matching on the Raspberry Pi, I was able to attach the STDOUT from the grep to wc -l
to get a count of the matching substances.
1 2 |
docker run -a stdout nimblestratus/rpi-substructure-grepper $PATTERN | wc -l |
This works just fine. In fact, it opens up opportunities for all sort of other commands/suites running inside a container. Pandoc running in a container to generate PDF’s comes to mind. Or ImageMagick. Or any of a number of other commands. All of the advantages of docker containers with all of the fun of UNIX pipes.
Then the imp of the perverse struck. If I could redirect the STDOUT of a container running on a local host, would it work as well on another? In short…. yes.
You can attach to the streams of a docker container running on a different host. The docker daemon needs to be bound to a port on the other host(s).
So, if I can run one at a time, why not five? I knocked out a couple of one line shell scripts (harness
and runner
) and, for grins and giggles, added a ‘-x’ magick cookie to demonstrate what’s happening. The lines below with the ‘+’ inside show the commands which are being performed behind the scenes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
matt@argentum:~/projects/apis/chem-swarm$ time ./harness 'CC1CCCCC=O' |tee /tmp/mout3 | wc -l + xargs -P 5 -n 1 ./runner CC1CCCCC=O + seq 1 5 + docker -H tcp://192.168.1.101:2375 run -a STDOUT -v /opt/smiles:/data -e PATTERN=CC1CCCCC=O nimblestratus/rpi-substructure-grepper + docker -H tcp://192.168.1.103:2375 run -a STDOUT -v /opt/smiles:/data -e PATTERN=CC1CCCCC=O nimblestratus/rpi-substructure-grepper + docker -H tcp://192.168.1.104:2375 run -a STDOUT -v /opt/smiles:/data -e PATTERN=CC1CCCCC=O nimblestratus/rpi-substructure-grepper + docker -H tcp://192.168.1.102:2375 run -a STDOUT -v /opt/smiles:/data -e PATTERN=CC1CCCCC=O nimblestratus/rpi-substructure-grepper + docker -H tcp://192.168.1.105:2375 run -a STDOUT -v /opt/smiles:/data -e PATTERN=CC1CCCCC=O nimblestratus/rpi-substructure-grepper 22 real 0m5.776s user 0m0.099s sys 0m0.091s |
In less than six seconds, it’s spawned docker containers on five other hosts. Each of these containers is performing a substructure (read grep
) search of ~13.7 million chemical compounds for a total of ~69M compounds. The results are then sent back to the initiating host, which is dumping the results to a file as well as counting the results. Not too shabby. And it scales to O(n), too — IO is the main limiting factor here.
I can think of lots of uses for this. Poor man’s parallel processing. Map/Reduce. Many more.
The disadvantage of this quick and dirty method is that you need to know the IP addresses on which to run the commands. Swarm alleviates the necessity of knowing the addresses or of coming up with a methodology for distributing the workload, which is always a plus.
It’s not necessarily something I’d do to go to production, but for testing or experimentation, it works quite well. It also leads to other experiments.
Docker is really awesome; I’m learning new things to do with it all the time.
2 pings
Stronger Faster Algorithms » Ramblings
August 27, 2015 at 3:24 am (UTC -5) Link to this comment
[…] previous implementation used a fairly naive approach — simply grep the collection of substances for using […]
?? Docker ???? - ???????
April 15, 2016 at 10:51 am (UTC -5) Link to this comment
[…] ‘Piping’ Hot Docker Containers […]