Version of smsd: 3.1.15
Smsd installed from: package repository
Name and model of a modem / phone: MultiModem GPRS
Interface: USB to Serial
We are running SMSD in conjunction with Nagios to send notifications to our admins of systems and services that down. My version of Nagios is 3.4.4.
I have the following perms and ownerships on the spool directories:
And within the /var/spool/sms directory, I have:
I have sms3 running as user nagios so that nagios can write to the spool directories. The sms3 startup script looks like this:
# Set USER and GROUP, if necessary:
USER="nagios"
GROUP="dialer"
# If an unpriviledged user is selected, make sure that next two
# files are writable by that user:
PIDFILE="/var/run/smsd/smsd.pid"
INFOFILE="/var/run/smsd/smsd.working"
# Logfile can also be defined in here:
LOGFILE="/var/log/smsd/smsd.log"
DAEMON=/usr/local/bin/smsd
# A program which turns power off for couple of seconds:
RESETMODEMS=/usr/local/bin/smsd_resetmodems
NAME=smsd
PSOPT="-e"
ECHO=echo
case `uname` in
*BSD|Darwin)
PSOPT="axc"
;;
SunOS)
ECHO=/usr/ucb/echo
;;
esac
# Maximum time to stop smsd, after that it gets killed hardly:
MAXWAIT=45
case "$1" in
start)
test -x $DAEMON || exit 0
$ECHO -n "Starting SMS Daemon: "
MSG="."
ARGS="-n MAINPROCESS -p$PIDFILE -i$INFOFILE"
[ "x$USER" != x ] && ARGS="$ARGS -u$USER"
[ "x$GROUP" != x ] && ARGS="$ARGS -g$GROUP"
[ "x$LOGFILE" != x ] && ARGS="$ARGS -l$LOGFILE"
PID=`cat $PIDFILE 2>/dev/null`
if [ "x$PID" != x ]; then
if kill -0 $PID 2>/dev/null; then
MSG=" already running ($PID)."
else
PID=""
fi
fi
if [ "x$PID" = x ]; then
if ps $PSOPT | grep $NAME | grep -v grep >/dev/null; then
MSG=" already running."
else
$DAEMON $ARGS
sleep 1
PIDS=`ps $PSOPT | grep $NAME | grep -v grep`
[ "x$PIDS" = x ] && MSG=" failed."
fi
fi
echo "$NAME$MSG"
;;
stop)
if ps $PSOPT | grep $NAME | grep -v grep >/dev/null; then
PID=`cat $PIDFILE 2>/dev/null`
if [ "x$PID" != x ]; then
P=`kill -0 $PID 2>/dev/null`
[ "x$P" != x ] && PID=""
fi
if [ "x$PID" != x ]; then
kill $PID
else
kill `ps $PSOPT | grep $NAME | grep -v grep | awk '{print $1}'` >/dev/null 2>&1
fi
sleep 1
if ps $PSOPT | grep $NAME | grep -v grep >/dev/null; then
echo "Allowing $NAME to terminate gracefully within $MAXWAIT seconds"
infofound=0
dots=0
seconds=0
while ps $PSOPT | grep $NAME | grep -v grep >/dev/null; do
if [ $infofound -lt 1 ]; then
if [ -f $INFOFILE ]; then
infofound=1
if [ $dots -gt 0 ]; then
echo ""
dots=0
fi
$ECHO -n "$NAME is currently "
cat $INFOFILE
echo "Time counting is now disabled and we will wait until this job is complete."
echo "If you are very hasty, use \"$0 force-stop\" to kill $NAME hardly (not recommended)."
fi
fi
[ $infofound -lt 1 ] && seconds=`expr $seconds + 1`
$ECHO -n "."
dots=`expr $dots + 1`
if [ "$seconds" -ge $MAXWAIT ]; then
if [ $dots -gt 0 ]; then
echo ""
dots=0
fi
echo "Timeout occured, killing $NAME hardly."
kill -9 `ps $PSOPT | grep $NAME | grep -v grep | awk '{print $1}'` >/dev/null 2>&1
[ -f $PIDFILE ] && rm $PIDFILE
seconds=0
fi
sleep 1
done
[ $dots -gt 0 ] && echo ""
#echo "$NAME is stopped."
fi
fi
;;
restart|reload)
$0 stop
$0 start
;;
force-stop)
if ps $PSOPT | grep $NAME | grep -v grep >/dev/null; then
echo "Killing $NAME."
kill -9 `ps $PSOPT | grep $NAME | grep -v grep | awk '{print $1}'` >/dev/null 2>&1
fi
[ -f $PIDFILE ] && rm $PIDFILE
;;
reset)
$0 stop
[ -f "$RESETMODEMS" ] && "$RESETMODEMS"
sleep 30
$0 start
;;
*)
echo "Usage: $0 {start|stop|restart|force-stop|reset}"
exit 1
esac
USER="nagios"
GROUP="dialer"
# If an unpriviledged user is selected, make sure that next two
# files are writable by that user:
PIDFILE="/var/run/smsd/smsd.pid"
INFOFILE="/var/run/smsd/smsd.working"
# Logfile can also be defined in here:
LOGFILE="/var/log/smsd/smsd.log"
DAEMON=/usr/local/bin/smsd
# A program which turns power off for couple of seconds:
RESETMODEMS=/usr/local/bin/smsd_resetmodems
NAME=smsd
PSOPT="-e"
ECHO=echo
case `uname` in
*BSD|Darwin)
PSOPT="axc"
;;
SunOS)
ECHO=/usr/ucb/echo
;;
esac
# Maximum time to stop smsd, after that it gets killed hardly:
MAXWAIT=45
case "$1" in
start)
test -x $DAEMON || exit 0
$ECHO -n "Starting SMS Daemon: "
MSG="."
ARGS="-n MAINPROCESS -p$PIDFILE -i$INFOFILE"
[ "x$USER" != x ] && ARGS="$ARGS -u$USER"
[ "x$GROUP" != x ] && ARGS="$ARGS -g$GROUP"
[ "x$LOGFILE" != x ] && ARGS="$ARGS -l$LOGFILE"
PID=`cat $PIDFILE 2>/dev/null`
if [ "x$PID" != x ]; then
if kill -0 $PID 2>/dev/null; then
MSG=" already running ($PID)."
else
PID=""
fi
fi
if [ "x$PID" = x ]; then
if ps $PSOPT | grep $NAME | grep -v grep >/dev/null; then
MSG=" already running."
else
$DAEMON $ARGS
sleep 1
PIDS=`ps $PSOPT | grep $NAME | grep -v grep`
[ "x$PIDS" = x ] && MSG=" failed."
fi
fi
echo "$NAME$MSG"
;;
stop)
if ps $PSOPT | grep $NAME | grep -v grep >/dev/null; then
PID=`cat $PIDFILE 2>/dev/null`
if [ "x$PID" != x ]; then
P=`kill -0 $PID 2>/dev/null`
[ "x$P" != x ] && PID=""
fi
if [ "x$PID" != x ]; then
kill $PID
else
kill `ps $PSOPT | grep $NAME | grep -v grep | awk '{print $1}'` >/dev/null 2>&1
fi
sleep 1
if ps $PSOPT | grep $NAME | grep -v grep >/dev/null; then
echo "Allowing $NAME to terminate gracefully within $MAXWAIT seconds"
infofound=0
dots=0
seconds=0
while ps $PSOPT | grep $NAME | grep -v grep >/dev/null; do
if [ $infofound -lt 1 ]; then
if [ -f $INFOFILE ]; then
infofound=1
if [ $dots -gt 0 ]; then
echo ""
dots=0
fi
$ECHO -n "$NAME is currently "
cat $INFOFILE
echo "Time counting is now disabled and we will wait until this job is complete."
echo "If you are very hasty, use \"$0 force-stop\" to kill $NAME hardly (not recommended)."
fi
fi
[ $infofound -lt 1 ] && seconds=`expr $seconds + 1`
$ECHO -n "."
dots=`expr $dots + 1`
if [ "$seconds" -ge $MAXWAIT ]; then
if [ $dots -gt 0 ]; then
echo ""
dots=0
fi
echo "Timeout occured, killing $NAME hardly."
kill -9 `ps $PSOPT | grep $NAME | grep -v grep | awk '{print $1}'` >/dev/null 2>&1
[ -f $PIDFILE ] && rm $PIDFILE
seconds=0
fi
sleep 1
done
[ $dots -gt 0 ] && echo ""
#echo "$NAME is stopped."
fi
fi
;;
restart|reload)
$0 stop
$0 start
;;
force-stop)
if ps $PSOPT | grep $NAME | grep -v grep >/dev/null; then
echo "Killing $NAME."
kill -9 `ps $PSOPT | grep $NAME | grep -v grep | awk '{print $1}'` >/dev/null 2>&1
fi
[ -f $PIDFILE ] && rm $PIDFILE
;;
reset)
$0 stop
[ -f "$RESETMODEMS" ] && "$RESETMODEMS"
sleep 30
$0 start
;;
*)
echo "Usage: $0 {start|stop|restart|force-stop|reset}"
exit 1
esac
My smsd.conf file has the following ownership and perms:
And it looks like this:
# Example smsd.conf. Read the manual for a description
devices = GSM1
outgoing = /var/spool/sms/outgoing
checked = /var/spool/sms/checked
failed = /var/spool/sms/failed
logfile = /var/log/smsd/smsd.log
blocktime = 300
#loglevel = 7
loglevel = 5
ignore_outgoing_priority = yes
trust_outgoing = yes
[GSM1]
device = /dev/cuaU0
incoming = 0 # no or 0 means that smsd does not receive messages
baudrate = 115200
pin = ignore
#pin = 1111
devices = GSM1
outgoing = /var/spool/sms/outgoing
checked = /var/spool/sms/checked
failed = /var/spool/sms/failed
logfile = /var/log/smsd/smsd.log
blocktime = 300
#loglevel = 7
loglevel = 5
ignore_outgoing_priority = yes
trust_outgoing = yes
[GSM1]
device = /dev/cuaU0
incoming = 0 # no or 0 means that smsd does not receive messages
baudrate = 115200
pin = ignore
#pin = 1111
'smsdconf' Syntax Highlight powered by GeSHi
Messages get written from Nagios to the sms spool directory (/var/spool/smsd/outgoing) just fine. Most of the time, the send_XXXXX messages are processed from /var/spool/smsd/outgoing to /var/spool/smsd/checked, just fine. However, on occasion (and quite frequently lately) smsd abnormally terminates with these messages in the log file:
My question is: Why would smsd perform just fine for periods of up to a month where it moves files from the outgoing to checked spool directory just fine and then, for no apparent reason I can find, terminate abnormally like this? Every time it has done this, I have looked at the send_XXXXX file it chokes on and there is a destination phone number in the message and it is in the correct format. The file permissions are the same as previous messages that have been processed without any problems as well. I don't understand why this keeps happening and I have run out of ideas of what to check. Could anyone please help me?
Thanks in advance,
Alton