OpenSSH Coredump and "Bad packet length" errors seen on 5.10 sparc sun4v (Generic_125100-10)
Yu He
heyu at nortel.com
Thu Apr 1 17:47:44 EST 2010
Hi,
OpenSSH coredump was seen on our customer's side causing ssh login slow
and manual command not workable.
We need help to identify the root cause. Thanks!!
>> Background:
1) server info:
# uname -a
SunOS owtnmncccm0cnmo 5.10 Generic_125100-10 sun4v sparc
SUNW,Netra-CP3060
bash-3.00# /usr/local/bin/ssh -v
OpenSSH_4.6p1, OpenSSL 0.9.8e 23 Feb 2007
bash-3.00# cat /usr/local/etc/sshd_config | grep -v "^#" | grep -v "^$"
Subsystem sftp /usr/local/libexec/sftp-server
2) there is no user activities around the issue. A daemon script was
running at background to sync certain files between our server pair
(From unit A to unit B, both have same configuration and OS level)
3) After error happened, system hung and responsed very slowly to like
ssh login
4) People had to reboot the server and everything looked fine
afterwards.
>> Coredump:
# cd /var/core
# ls -ltr
-rw------- 1 root root 5199389 Feb 24 07:01
core_owtnmncccm0cnmo_ssh_0_0_1267016512_6729
# pstack core_owtnmncccm0cnmo_ssh_0_0_1267016512_6729
core 'core_owtnmncccm0cnmo_ssh_0_0_1267016512_6729' of 6729:
/usr/local/bin/ssh root at server0-unit1 rm -f
/etc/init.d/staticroutes
ff1ee314 AES_decrypt (3c, d1, aaa5d0a5, 314, 74, 3b0) + 2f4
ff1ee66c AES_cbc_encrypt (74490, 774a8, 10, 6a358, 61fb8,
61fb8) +2c
ff238abc aes_128_cbc_cipher (1, 774a8, 74490, 10, f0,
ff2d9a18) + 1c
ff23dfb8 EVP_Cipher (61f98, 774a8, 74490, 10, 61800, 62400) +
18
0002f3e4 cipher_crypt (61f94, 774a8, 74490, 10, f0, 7b528) +
34
000338a4 packet_read_poll_seqnr (ffbfe474, 62000, 62000,
620f0,61800, 62400) + 258
00033f94 packet_read_seqnr (0, 6, ffbfe510, 628a8, f0, 3c) +
40
00038bbc dispatch_run (0, ffbfe524, ffbfe510, ffbfe4f0,
624ac, ff) +1c
00025988 ssh_userauth2 (64568, 65250, 72e08, 628a8, 1, 0) +
52c
00021a20 ssh_login (72e08, 4, 45400, 14, 45400, a) + 3a4
000196b4 main (62b14, 647e4, 42a60, 42a58, 42800, 62800)
+ 8a4
00017e48 _start (0, 0, 0, 0, 0, 0) + 5
Around the coredump, many "Bad packet length" errors can be seen in
/var/adm/message, like:
Disconnecting: Bad packet length 2298694383.
Feb 24 07:00:36 owtnmncccm0cnmo sshd[860]: [ID 800047
auth.info]
Disconnecting: Bad packet length 604783901.
Feb 24 07:00:36 owtnmncccm0cnmo sshd[873]: [ID 800047
auth.info]
Disconnecting: Bad packet length 2577232018.
>> More:
SSH calling is by sync daemon script (from server0-unit0 to
server0-unit1), trying to remove /etc/init.d/staticroutes file which is
on unit1 but not on unit0
A snippet:
message_out ${MSG_TRACE} "The replicated file $a_file does not exist;
remove it from the other unit: $OTHERUNIT." yes
/usr/local/bin/ssh root@${OTHERUNIT} "rm -f ${a_file}"
rc=$?
if [ $rc -ne 0 ]; then
message_out $MSG_ERROR "Failed to delete ${a_file} on
${OTHERUNIT} with return code $rc." yes
return 1
fi
FYI: No record now if or not this calling (rm -f
/etc/init.d/staticroutes) succeeded.
Coredump and message files have back ups. Tell me if I need upload it
somewhere.
Looking forward to your help&advice
Regards,
Yu
More information about the openssh-unix-dev
mailing list