OpenSSH Coredump and "Bad packet length" errors seen on 5.10 sparc sun4v (Generic_125100-10)

Yu He heyu at nortel.com
Thu Apr 1 17:47:44 EST 2010


 
Hi,
 
OpenSSH coredump was seen on our customer's side causing ssh login slow
and manual command not workable.
We need help to identify the root cause. Thanks!!
 
>> Background:
1) server info:
# uname -a
SunOS owtnmncccm0cnmo 5.10 Generic_125100-10 sun4v sparc
SUNW,Netra-CP3060
 
bash-3.00# /usr/local/bin/ssh -v
OpenSSH_4.6p1, OpenSSL 0.9.8e 23 Feb 2007
 
bash-3.00# cat /usr/local/etc/sshd_config | grep -v "^#" | grep -v "^$"
Subsystem       sftp    /usr/local/libexec/sftp-server
 
2) there is no user activities around the issue. A daemon script was
running at background to sync certain files between our server pair
(From unit A to unit B, both have same configuration and OS level)
 
3) After error happened, system hung and responsed very slowly to like
ssh login
 
4) People had to reboot the server and everything looked fine
afterwards.
 
>> Coredump:

	  # cd /var/core
	  # ls -ltr
	  -rw-------   1 root     root     5199389 Feb 24 07:01
core_owtnmncccm0cnmo_ssh_0_0_1267016512_6729
	
	  # pstack core_owtnmncccm0cnmo_ssh_0_0_1267016512_6729
	  core 'core_owtnmncccm0cnmo_ssh_0_0_1267016512_6729' of 6729:

	        /usr/local/bin/ssh root at server0-unit1 rm -f
/etc/init.d/staticroutes
	   ff1ee314 AES_decrypt (3c, d1, aaa5d0a5, 314, 74, 3b0) + 2f4
	   ff1ee66c AES_cbc_encrypt (74490, 774a8, 10, 6a358, 61fb8,
61fb8) +2c
	   ff238abc aes_128_cbc_cipher (1, 774a8, 74490, 10, f0,
ff2d9a18) + 1c
	   ff23dfb8 EVP_Cipher (61f98, 774a8, 74490, 10, 61800, 62400) +
18
	   0002f3e4 cipher_crypt (61f94, 774a8, 74490, 10, f0, 7b528) +
34
	   000338a4 packet_read_poll_seqnr (ffbfe474, 62000, 62000,
620f0,61800, 62400) + 258
	   00033f94 packet_read_seqnr (0, 6, ffbfe510, 628a8, f0, 3c) +
40
	   00038bbc dispatch_run (0, ffbfe524, ffbfe510, ffbfe4f0,
624ac, ff) +1c
	   00025988 ssh_userauth2 (64568, 65250, 72e08, 628a8, 1, 0) +
52c
	   00021a20 ssh_login (72e08, 4, 45400, 14, 45400, a) + 3a4
	   000196b4 main     (62b14, 647e4, 42a60, 42a58, 42800, 62800)
+ 8a4
	   00017e48 _start   (0, 0, 0, 0, 0, 0) + 5

Around the coredump, many "Bad packet length" errors can be seen in
/var/adm/message, like:

	Disconnecting: Bad packet length 2298694383.
	  Feb 24 07:00:36 owtnmncccm0cnmo sshd[860]: [ID 800047
auth.info]
	Disconnecting: Bad packet length 604783901.
	  Feb 24 07:00:36 owtnmncccm0cnmo sshd[873]: [ID 800047
auth.info]
	Disconnecting: Bad packet length 2577232018.

>> More:
SSH calling is by sync daemon script (from server0-unit0 to
server0-unit1), trying to remove /etc/init.d/staticroutes file which is
on unit1 but not on unit0
A snippet:
message_out ${MSG_TRACE} "The replicated file $a_file does not exist;
remove it from the other unit: $OTHERUNIT." yes
      /usr/local/bin/ssh root@${OTHERUNIT} "rm -f ${a_file}"
      rc=$?
      if [ $rc -ne 0 ]; then
         message_out $MSG_ERROR "Failed to delete ${a_file} on
${OTHERUNIT} with return code $rc." yes
         return 1
      fi
FYI: No record now if or not this calling (rm -f
/etc/init.d/staticroutes) succeeded.
 
Coredump and message files have back ups. Tell me if I need upload it
somewhere.
 
Looking forward to your help&advice
 
Regards,
Yu
 
 
 
 
 
 


More information about the openssh-unix-dev mailing list