A coworker and I discovered an issue with jboss’ run.sh (which starts the app server). The problem lies in different flavours of unix (or unix-like) shells returning different values for wait
.
The relevant code is:
1 2 3 4 5 6 7 |
# Wait until the background process exits WAIT_STATUS=0 while [ "$WAIT_STATUS" -ne 127 ]; do JBOSS_STATUS=$WAIT_STATUS wait $JBOSS_PID 2>/dev/null WAIT_STATUS=$? done |
This is all well and good in linux — redhat uses /bin/bash
and ubuntu uses /bin/dash
for /bin/sh
— both of which return 127 when waiting for a process which does not exist. However, Solaris’ /bin/sh
returns 0 (/bin/ksh
returns 127).
So, the run.sh goes into an infinite loop, thrashing, badly. CPU gets pegged and all that fun stuff.
How to fix? Well in order to make it OS/shell dependant, we’ll determine the value which is returned by wait when a process does not exist. We’re guaranteed that there is one process id which won’t be used in unix — 0. So, we wait on PID 0, and use the return value, $?
to determine how the environment handles the wait. The “fixed” code looks like:
1 2 3 4 5 6 7 8 9 |
wait 0 2>/dev/null NO_SUCH_PID=$? # Wait until the background process exits WAIT_STATUS=0 while [ "$WAIT_STATUS" -ne $NO_SUCH_PID ]; do JBOSS_STATUS=$WAIT_STATUS wait $JBOSS_PID 2>/dev/null WAIT_STATUS=$? done |
EDIT:
This was fixed in 4.2.3 GA with the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# Wait until the background process exits WAIT_STATUS=128 while [ "$WAIT_STATUS" -ge 128 ]; do wait $JBOSS_PID 2>/dev/null WAIT_STATUS=$? if [ "${WAIT_STATUS}" -gt 128 ]; then SIGNAL=`expr ${WAIT_STATUS} - 128` SIGNAL_NAME=`kill -l ${SIGNAL}` echo "*** JBossAS process (${JBOSS_PID}) received ${SIGNAL_NAME} signal. ***" >&2 fi done if [ "${WAIT_STATUS}" -lt 127 ]; then JBOSS_STATUS=$WAIT_STATUS else JBOSS_STATUS=0 fi |