At elastic.io, we are using the ideology “one process per Docker container“. Surely, we apply this ideology for running integration components as well. So, each of our integration components is actually one process inside of one Docker container, and each of these Docker containers is running on Mesosphere and Kubernetes.
Recently, though we had been having some unexplainable issues with the fact how exactly these processes were terminated. Somehow, orchestrators thought that integration components were failing all the time.
As soon as we located the issue and fixed it, our KPIs increased as you can see on the graphs below.
So, what was the reason of such a change in KPIs above? If you are interested in technical details, welcome…
Why did this happen?
Turns out that NodeJS is not able to receive signals and handle them appropriately (if it runs as PID 1). By signals, I mean kernel signals like SIGTERM, SIGINT, etc.
The following code wouldn’t work at all if you run NodeJS as PID 1:
process.on('SIGTERM', function onSigterm() { // do the cleaning job, but it wouldn't process.exit(0); });
As a result, you will get a zombie process which will be terminated forcefully via SIGKILL signal, meaning, that your “clean up” code will not be called at all.
So what, you might say. I’ll describe a real case.
Where does this occur?
At elastic.io, we are using Mesosphere and Kubernetes as our orchestrators. When Mesos\Kubernetes decides to kill the task, the following is happening.
Mesos sends SIGTERM and waits for some time for a process to die. If this doesn’t happen, it will send SIGKILL (which is supposed to force-kill the task) and mark the task as a failed task. The same flow is applied to Kubernetes.
If you have a NodeJS application that listens for RabbitMQ messages and you will not close all the listeners on SIGTERM, it will continue listening and is not going to close the process -> SIGKILL arrives to do the job.
Since our platform relies on statuses returned from Mesos\Kubernetes, we make false assumptions about the state of the task, which arises unknown to us issues and shows a wrong behaviour of the platform. We never wanted to have an unexpected behaviour, did we?
What best practices say about PID 1 case?
Node.js was not designed to run as PID 1, which leads to an unexpected behaviour when running inside of Docker. For example, a Node.js process running as PID 1 will not respond to SIGINT (CTRL-C) and similar signals. (source)
Boom!
Imagine you have an app written in NodeJS which is doing some job as a daemon on Mesos\Kubernetes, waiting for the signal to kill it.
You have listeners for SIGTERM and you can close all the connections that the daemon uses on SIGTERM. The daemon would then notify with the exit code 0 that everything is ok.
Only in this particular case, it would not. The NodeJS app is not even able to understand that someone wants to close it, so it just continues to work, waiting for SIGKILL signal to come and make a massacre.
What is the explanation from UNIX perspective?
I found a great explanation in this article.
But there is a special case. Suppose the parent process terminates, either intentionally (because the program logic has determined that it should exit), or caused by a user action (e.g. the user killed the process). What happens then to its children? They no longer have a parent process, so they become “orphaned” (this is the actual technical term).
And this is where the init process kicks in. The init process — PID 1 — has a special task. Its task is to “adopt” orphaned child processes (again, this is the actual technical term). This means that the init process becomes the parent of such processes, even though those processes were never created directly by the init process.
And, of course, NodeJS is not designed to be the init system. So, this means that any of our applications must be run under some init process, which will spawn our app under itself or will become a parent of such process in the future.
What is the solution? How do we fix the problem? How can we propagate kernel signals to our app?
Docker init
You can solve the issue by simply adding flag init when running Docker images:
docker run --init your_image_here
It will wrap your processes with a tiny init system, which will leverage all the kernel signals to its child and make sure that any orphaned processes are reaped.
Well, it’s ok, but what if we need to remap exit codes? For instance, when Java exits by SIGTERM signal, it will return exit code 143, not 0.
When reporting the exit status with the special parameter ‘?’, the shell shall report the full eight bits of exit status available. The exit status of a command that terminated because it received a signal shall be reported as greater than 128. (source)
Docker init is not able to handle such cases. That’s how we found our ideal solution to these cases — Tini.
Tini
Tini is the simplest init you could think of. All Tini does is spawn a single child (Tini is meant to be run in a container), and wait for it to exit all the while reaping zombies and performing signal forwarding. (source)
With the recent release, we were able to remap exit code 143 to 0, so we can run our Java and NodeJS processes under Docker with the following command:
ENTRYPOINT ["/tini", "-v", "-e", "143", "--", "/runner/init"]
Epilogue
That way we’ve fixed all the issues related to processing the kernel signals in our applications, so that they are able to handle them and respond.
As a bonus, we got the ability to remap exit codes in cases when a child process responds with (128 + SIGNAL). I.e., where application got SIGTERM (code 15), in some cases it will be 143 (128 + 15), which means normal exit from the process.
Hope the article helps you to find some unexpected behaviour in your applications.
References
The article was originally published on Eugene’s own blog and is republished here by courtesy of the author.