«

»

Mar 17

Docker, Cgroups, Memory Constraints, and Java: A Cautionary Tale, or Here be Reapers (sometimes)

TL;DR: Java and cgroups/Docker memory constraints don’t always
behave as you might expect. Always explicitly specify JVM heap
sizes. Also be aware that kernel features may not be enabled. And Linux… lies.

I’ve recently discovered an interesting “quirk” in potential
interactions between Java, cgroups, Docker, and the kernel which can
cause some surprising results.

Unless you explicitly state heap sizes, the JVM makes guesses about
sizing based on the host on which it runs. In general on any “server
class” machine — which now refers to just about anything other than a
Windows desktop or a Raspberry Pi — by default specifies a maximum
heap size of approximately 1/4 of the ram on the host. Where this
becomes interesting is that specifying the amount of memory available
to a container does not affect what the jvm believes is available.

Last year I wrote in
Looking Inside a JVM: -XX:+PrintFlagsFinal
about finding the values configured in the JVM at runtime. By not
specifying a heap size, I get the following on a host with 12G of ram:

Notice that the MaxHeapSize is ~3GB.

You ever look inside of Java …. in Docker? — Half Brewed

It’s the same. Ok, let’s set the max memory size of the container to
256m (-m 256m) and try again:

Note the Warning…. we’ll come back to it later (much later)

And… it’s the same.

Fabio Kung has written an interesting discussion of
Memory inside Linux containers
and the reasons for why system calls do not return the amount of
memory inside a container. In short, the various tools and system
calls (including those which the JVM invoke) were created before
cgroups and have no concept that such limits might exist.

So, how much memory is actually available to the
JVM? Let’s start with a class which eats memory. I found the following
code at
Java memory test – How to consume all the memory (RAM) on a computer:

We can use the Docker Java container to compile it:

Now that it is compiled, let’s test:

There are a few interesting flags:

Flag Explanation
-XX:OnOutOfMemoryError=”echo Out of Memory” Instruct the
JVM to output a message on
[OutOfMemoryError](https://docs.oracle.com/javase/7/docs/api/java/lang/OutOfMemoryError.html)
-XX:ErrorFile=fatal.log When a fatal error occurs, an
error log is created with information and the state obtained at the
time of the fatal
error. ([Fatal Error Log – Troubleshooting Guide for Java SE 6 with HotSpot VM](http://www.oracle.com/technetwork/java/javase/felog-138657.html))

Betwixt the two flags, we should get some indication of an error….

Testing, Testing….

The tests were performed in a variety of scenarios:

Environment Docker Version Ram Swap Docker Memory Constraint Note(s)
4 Core, Openstack Instance 1.8.3 24G 0 --memory=256m [HCF](https://en.wikipedia.org/wiki/Halt_and_Catch_Fire) within seconds — the OOMKiller kills the process.
4 Core, Physical 1.10.3 12G 15G --memory=256m Runs for a while and ends with OutOfMemoryError
8 Core, Physical 1.9.1 32G 32G --memory=256m Runs for about 5 minutes and exits with OutOfMemoryError
8 Core, Physical 1.9.1 32G 32G --memory=255m --memory-swap=256m Runs for about 5 minutes and exits with OutOfMemoryError
8 Core, Physical 1.9.1 32G 32G --memory=255m --memory-swap=256m Kernel level swap accounting turned on. OOMKiller strikes almost immediately.

In each case, the OS is Ubuntu 14.04 and the Docker container is java:latest.

I was expecting that the jvm would quickly attempt to grow beyond the container constraints and be killed. In the first test, it behaved as I expected. The container starts and then the logs abruptly end:

Upon inspection of the container, I see that it was killed by the OOMKiller:

Odd behavior, but just as I expected. cgroups is enforcing the amount
of space used by a container, but when the JVM or any other program
queries for the available memory, it doesn’t interfere:

At this point I decided that I had an interesting enough topic to write about. Little did I know but that I was about to go…..

Down the Rabbit Hole

Down the Rabbit Hole

I set down to diligently write about my findings; re-running the test on my laptop (the second entry in the table above), I was surprised to find that it behaved differently.

At first I thought it might be due to differences in Docker versions, so I tried on the 3rd host, where it ran even longer than on the laptop!

(note the insane length of the garbage collection; this should have been my clue that something was seriously weird!)

I didn’t find anything indicating that memory constraints behaved differently between the 1.8.3 and more current versions.

I then wondered if it might be related to HugePageTables. As of 2011, Documentation/cgroups/memory.txt [LWN.net] states:

Kernel memory and Hugepages are not under control yet. We just manage
pages on LRU.

Ok… let’s see if it’s enabled:

Yup… I had them.

I then disabled HugePages on the 8 core host:

Ok, disabled. I rebooted for paranoia and re-ran my test. Still failed. Grump.

It was time to….

The Docker Run Reference section on memory constraints specifies that there are four scenarios for setting user memory usage:

  1. No memory limits; the container can use as much as it likes. (Default behavior)

  2. Specify memory, but no memory-swap — the container ram is limited and it may use an equivalent amount of swap as memory.

  3. Specify memory and infinite (-1) memory-swap — the container is limited in ram, but not in swap.

  4. Specify memory and memory-swap to set the total amount. In this case, memory-swap needs to be larger than memory:

The total amount is denoted by memory-swap.

Aha! I’ll just set these flags and run my container again…. Drat.

It still isn’t working.

And swap keeps growing and growing….

By now, it’s going on 3AM, but I’m definitely going to figure this out.

At this point I remembered the warning:

A little bit of googling and I find that I need to set a kernel parameter. This can be done via grub.

You will need to edit /etc/default/grub — it is owned by root, so you will likely need to sudo.

On the GRUB_CMDLINE_LINUX line, edit it to add

  • cgroup_enable=memory
  • swapaccount=1

If there are no other arguments, it will look like this:

GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"

If there are other arguments, then just add the above; you’ll end up
with something along the lines of:

GRUB_CMDLINE_LINUX="acpi=off noapic cgroup_enable=memory swapaccount=1"

Next sudo update-grub && sudo reboot

Once the host reboots, the warning disappears and jvm is killed as expected:

Conclusion

The reason it behaved as expected on the OpenStack instance was that
there is no swap
on the instance. Since there is no swap to be had,
the container is, by necessity, limited to the size of the memory
specified. And the jvm instance was reaped by the OOMKiller, as I’d expected it would.

oomkiller

This was definitely an instance of accidental success!

The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’ but ‘That’s funny…’ Isaac Asimov

I’m glad I went down the rabbit hole on this one; I learned a good bit even if it took considerably longer than I’d expected.

A few caveats with which to leave you:

  1. It is best to always specify heap sizes when using the JVM. Don’t depend on heuristics. They can, have, and do change from version to version, let alone operating system and a host of other variables.
  2. Assume that the OS lies and there’s less memory than it tells you. I haven’t even mentioned Linux’ “optimistic malloc” yet.
  3. Know thy system. Understand how the different pieces work together.
  4. And remember…. No software, just like no plan, survives contact with the …. user.

6 comments

3 pings

Skip to comment form

  1. Ruslan Synytsky (@siruslan)

    Matt, very nice findings! Do you know any use cases when undefined heap sizes / defaults are useful? It’s interesting to understand what kind of real world apps may have such issue inside “lying” containers. Maybe it’s not a big issue?..
    Thanks

  2. Naveen (@snaveen)

    Thanks for the insights!

  3. Matt Williams

    Actually… I can’t really think of any instances where you wouldn’t want to define heap sizes in a production environment on a “modern” machine. The default heuristics are going to trigger a lot of garbage collections. That said, it might be useful if you’re trying to get an idea in development of the sizing needed for the JVM.

    Of course, it isn’t just the JVM which is potentially affected — almost any program of sufficient size could have the same sort of issue. The one which I think is really fun is where swap accounting is turned off and memory in excess of the container size goes to swap… It’s really painful when it’s a JVM, but I expect other memory intensive processes would be amusing in a “don’t try this at home, kids” sort of fashion.

  4. Matt Williams

    My pleasure. It started with a “hmmm, that’s strange” and went from there!

  5. Ruslan

    Matt, thanks again for your findings. We figured out that the issue affects not only the heap usage, but also the native (off-heap) memory. The team just published one more article related to this issue http://blog.jelastic.com/2017/04/13/java-ram-usage-in-containers-top-5-tips-not-to-lose-your-memory/

  6. Charlie Hunt

    Matt, since Java 8u131, there is a new JVM command line option called -XX:+UseCGroupMemoryLimitForHeap that can be used when you run a Docker container with a memory limit, i.e. $ docker run -m= [other run options] [command]. The -XX:+UseCGroupMemoryLimitForHeap will use the docker memory limit and ergonomically set a max Java heap size in cases where you have not specified -Xmx as a Java command line option. If you do specify -Xmx, the JVM will use the -Xmx value for the max Java heap. The -XX:+UseCGroupMemoryLimitForHeap categorized as an “experimental” JVM command line option which mean it also requires -XX:+UnlockExperimentalVMOptions. It is categorized as experimental because in the future, it is expected that the JVM will transparently (not require a command line option to) figure out if there is a docker memory limit in use. When that enhancement is available, you can expect the -XX:+UseCGroupMemoryLimitForHeap to be deprecated.
    So, to use this command line option, you must use both:
    -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap
    A side note, also as of Java 8u131, the JVM will transparently identify if a Docker container, or cgroups are running with a limit on the number of CPUs.

  1. Java and Memory Limits in LXC, Docker and OpenVZ

    […] overnight “journey” with JVM heap default behaviour in a Docker container. He figured out that RAM limits are not correctly displayed inside a container. As a result, any Java or other application sees the total amount of RAM […]

  2. Java e o limite de memória nos containers: LXC, Docker e OpenVZ - Caderno de estudo TI

    […] com um comportamento padrão de heap JVM em um container Docker. Ele descobriu que os limites de RAM não são exibidos corretamente dentro de um container. Como resultado, qualquer aplicativo Java ou outro vê a quantidade total de […]

  3. Java ? ??????????? ?????? ? ???????????: LXC, Docker ? OpenVZ | ???? ??????? ??????

    […] ?? ??????????? ?????????? ?????? JVM. ?? ?????????, ??? ??????????? RAM ???????????? ??????????? ?????? ??????????. ? ??????????, […]

Leave a Reply

%d bloggers like this: