[netflow-tools] softflowd keeps crashing

alex k xela at mailinglist.at
Mon Apr 13 22:20:57 EST 2009


Hi,

> On Mon, 13 Apr 2009, alex k wrote:
>
>> > Are all the flows incorrectly dated, or just the ones from around the
>> time
>> > softflowd exited?
>> >
>>
>> It seems to me, that the first one or two flows after the crash
>> (softflowd
>> gets started automatically) are the wrong dated ones.
>> It crashed at 00:04 and was started a few seconds after that (I found a
>> very fast way to do that).
>
> Just to be clear: it is the first flows out of softflowd after a restart
> and not the last couple before a crash that have invalid times? Are both
> the start time and the end time incorrect?
>

Yes, the first flows AFTER the restart are the suspicious ones.
Softflowd was (re)started with my normal init script, so it didn't start
in debug mode and I have nothing to compare the output of nfdump with.

All the dates in nohup.out file (=before the crash, even after the
shutdown log entry) are correct.

> Could you try to find the details of this flow in the softflowd debug log
> and see if the times are incorrect there too? The flow start time comes
> from
> libpcap, so it is possible that it is giving us bad data.
>
>> >> What happened? Network error? Corrupted file? Socket problem?
>> >
>> > Is anything restarting (bringing down and back up) the network
>> interface
>> > on which softflowd is listening? That can cause this sort of problem.
>> > This line:
>> >
>> >> Shutting down after pcap EOF
>> >
>> > Indicates that libpcap has closed itself.
>>
>> As far as I can see, the network interface had no problem at that time.
>> The host is monitored and was never unreachable.
>> It could have been a problem with VMware. (The IP with the wrong dated
>> entries is a virtual machine.)
>>
>> How can I find out, if it's a libpcap problem? It all happens in memory,
>> right?
>
> Are you running softflowd with a pcap filter on the commandline?

No, I don't even know how to do that. ;)

The commandline is:
softflowd -i eth0 -T proto -v 5 -t maxlife=1m -n 127.0.0.1:4711

>
> You might also want to try this diff:
>
> Index: softflowd.c
> ===================================================================
> RCS file: /var/cvs/softflowd/softflowd.c,v
> retrieving revision 1.98
> diff -u -p -r1.98 softflowd.c
> --- softflowd.c	3 Sep 2007 10:50:05 -0000	1.98
> +++ softflowd.c	13 Apr 2009 11:04:10 -0000
> @@ -1916,7 +1916,7 @@ main(int argc, char **argv)
>  				logit(LOG_ERR, "Exiting on pcap_dispatch: %s",
>  				    pcap_geterr(pcap));
>  				break;
> -			} else if (r == 0) {
> +			} else if (r == 0 && capfile != NULL) {
>  				logit(LOG_NOTICE, "Shutting down after "
>  				    "pcap EOF");
>  				graceful_shutdown_request = 1;
>

O.K. I patched and recompiled and started softflowd with nohup.
We might have to wait for a week now. ;)

xela




More information about the netflow-tools mailing list