A coworker and I discovered an issue with jboss’ run.sh (which starts the app server). The problem lies in different flavours of unix (or unix-like) shells returning different values for wait.
The relevant code is:
# Wait until the background process exits
WAIT_STATUS=0
while [ "$WAIT_STATUS" -ne 127 ]; do
JBOSS_STATUS=$WAIT_STATUS
wait $JBOSS_PID 2>/dev/null
WAIT_STATUS=$?
done
This is all well and good in linux — redhat uses /bin/bash and ubuntu uses /bin/dash for /bin/sh — both of which return 127 when waiting for a process which does not exist. However, Solaris’ /bin/sh returns 0 (/bin/ksh returns 127).
So, the run.sh goes into an infinite loop, thrashing, badly. CPU gets pegged and all that fun stuff.
How to fix? Well in order to make it OS/shell dependant, we'll determine the value which is returned by wait when a process does not exist. We're guaranteed that there is one process id which won't be used in unix -- 0. So, we wait on PID 0, and use the return value, $? to determine how the environment handles the wait. The "fixed" code looks like:
wait 0 2>/dev/null
NO_SUCH_PID=$?
# Wait until the background process exits
WAIT_STATUS=0
while [ "$WAIT_STATUS" -ne $NO_SUCH_PID ]; do
JBOSS_STATUS=$WAIT_STATUS
wait $JBOSS_PID 2>/dev/null
WAIT_STATUS=$?
done
EDIT:
This was fixed in 4.2.3 GA with the following code:
# Wait until the background process exits
WAIT_STATUS=128
while [ "$WAIT_STATUS" -ge 128 ]; do
wait $JBOSS_PID 2>/dev/null
WAIT_STATUS=$?
if [ "${WAIT_STATUS}" -gt 128 ]; then
SIGNAL=`expr ${WAIT_STATUS} - 128`
SIGNAL_NAME=`kill -l ${SIGNAL}`
echo "*** JBossAS process (${JBOSS_PID}) received ${SIGNAL_NAME} signal. ***" >&2
fi
done
if [ "${WAIT_STATUS}" -lt 127 ]; then
JBOSS_STATUS=$WAIT_STATUS
else
JBOSS_STATUS=0
fi