From trisowens at wrcra.com  Thu Apr  2 04:59:41 2009
From: trisowens at wrcra.com (Tris Owens)
Date: Wed, 01 Apr 2009 18:59:41 +0100
Subject: [netflow-tools] Problem with flowd.log
Message-ID: <20090401175941.CBFCF3000207@bmail00.one.com>

Hi,

I've just installed flowd (redhat). It seems to be running ok, I did have a few problems at first but looking at previous posts I noticed someone else has the same problem and I included the 'pidfile' line in the flowd.conf and now I see the processes running and it seems to have started up ok. 

My problem is the flowd.log file is not storing any data. I did a tcpdump and can see my test device sending 'something' every few minutes but the log file stays empty. It's sending on port 2055, which is the default for the device I'm trying to get stats from (Riverbed WAN optimisation device). 

I'm not all that with Linux, it's not my thing really but I didn't notice any issues while going through the install. Also, if I run with -d I don't see any errors.

I have these lines in my flowd.conf, although I have tried many options.

pidfile "/var/run/flowd.pid
logfile "/var/log/flowd.log"
listen on [ : : ]:2055
flow source 10.0.0.0/8
store ALL

Any help appreciated
Thanks
Tris Owens

From djm at mindrot.org  Thu Apr  9 16:59:09 2009
From: djm at mindrot.org (Damien Miller)
Date: Thu, 9 Apr 2009 16:59:09 +1000 (EST)
Subject: [netflow-tools] softflowd keeps crashing
In-Reply-To: <fcc3d0724c787ce66aa853cdc4314f91.squirrel@www.mailinglist.at>
References: <fcc3d0724c787ce66aa853cdc4314f91.squirrel@www.mailinglist.at>
Message-ID: <alpine.BSO.2.00.0904091658050.8718@fuyu.mindrot.org>

On Fri, 27 Mar 2009, alex k wrote:

> hi there,
> 
> first of all, softflowd is a cool piece of software.
> we have it on other linux machines (gateways) and it runs perfectly stable
> there.
> i use softflowd to collect data and nfsen to capture and evaluate.
> 
> but there is one host, where softflowd keeps crashing.
> i am a bit clueless as instability doesn't seem to be a problem of softflowd.
> at least i didn't find anything in the web or this list.
> 
> some information about the host:
> 
> kernel (64 bit):
> 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64 x86_64 x86_64
> GNU/Linux
> libpcap version:
> libpcap0-0.9.8-47.41
> softflowd version:
> softflowd-0.9.8 (compiled without problems on that machine)
> 
> on this host with one network card runs vmware-server with several guests.
> the guests use bridged networking, every has its own ip address, but as
> mentioned - there is only one network card.
> 
> softflowd crashes occasionally. sometimes once in two weeks, sometimes
> twice a day.
> the process disappears, the pid file stays.
> 
> the only thing i recognized is, that at the same time there are often
> flows with completely wrong date (about 6 weeks in the future).
> not exactly the same time, of course. when softflowd crashes, the possibly
> critical data is lost.
> 
> so my questions are:
> 1) where does softflowd get its time from?
> 2) can the wrong time be a problem?
> 3) what else could cause the crashes
> 
> especially: how can i find it out.
> softflowd is _very_ quiet. nothing in the syslog, no message at all.

Try running it manually in debug mode:

nohup flowd -dg &

and see what is in nohup.out if/when it crashes.

-d

From djm at mindrot.org  Thu Apr  9 17:01:00 2009
From: djm at mindrot.org (Damien Miller)
Date: Thu, 9 Apr 2009 17:01:00 +1000 (EST)
Subject: [netflow-tools] Problem with flowd.log
In-Reply-To: <20090401175941.CBFCF3000207@bmail00.one.com>
References: <20090401175941.CBFCF3000207@bmail00.one.com>
Message-ID: <alpine.BSO.2.00.0904091659570.8718@fuyu.mindrot.org>

On Wed, 1 Apr 2009, Tris Owens wrote:

> Hi,
>
> I've just installed flowd (redhat). It seems to be running ok, I did
> have a few problems at first but looking at previous posts I noticed
> someone else has the same problem and I included the 'pidfile' line
> in the flowd.conf and now I see the processes running and it seems to
> have started up ok.
>
> My problem is the flowd.log file is not storing any data. I did a
> tcpdump and can see my test device sending 'something' every few
> minutes but the log file stays empty. It's sending on port 2055, which
> is the default for the device I'm trying to get stats from (Riverbed
> WAN optimisation device).
>
> I'm not all that with Linux, it's not my thing really but I didn't
> notice any issues while going through the install. Also, if I run with
> -d I don't see any errors.
>
> I have these lines in my flowd.conf, although I have tried many
> options.
>
> pidfile "/var/run/flowd.pid
> logfile "/var/log/flowd.log"
> listen on [ : : ]:2055

This will make flows listen on an IPv6 wildcard address, which is probably
not what you want. I'd suggest:

listen on [0.0.0.0]:2055

-d

From xela at mailinglist.at  Fri Apr 10 00:41:15 2009
From: xela at mailinglist.at (alex k)
Date: Thu, 9 Apr 2009 16:41:15 +0200 (CEST)
Subject: [netflow-tools] softflowd keeps crashing
In-Reply-To: <alpine.BSO.2.00.0904091658050.8718@fuyu.mindrot.org>
References: <fcc3d0724c787ce66aa853cdc4314f91.squirrel@www.mailinglist.at>
	<alpine.BSO.2.00.0904091658050.8718@fuyu.mindrot.org>
Message-ID: <c134022413f5bda34c6b6c85c59fb67c.squirrel@www.mailinglist.at>

> On Fri, 27 Mar 2009, alex k wrote:
>
>> hi there,
>>
>> first of all, softflowd is a cool piece of software.
>> we have it on other linux machines (gateways) and it runs perfectly
>> stable
>> there.
>> i use softflowd to collect data and nfsen to capture and evaluate.
>>
>> but there is one host, where softflowd keeps crashing.
>> i am a bit clueless as instability doesn't seem to be a problem of
>> softflowd.
>> at least i didn't find anything in the web or this list.
>>
>> some information about the host:
>>
>> kernel (64 bit):
>> 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64 x86_64 x86_64
>> GNU/Linux
>> libpcap version:
>> libpcap0-0.9.8-47.41
>> softflowd version:
>> softflowd-0.9.8 (compiled without problems on that machine)
>>
>> on this host with one network card runs vmware-server with several
>> guests.
>> the guests use bridged networking, every has its own ip address, but as
>> mentioned - there is only one network card.
>>
>> softflowd crashes occasionally. sometimes once in two weeks, sometimes
>> twice a day.
>> the process disappears, the pid file stays.
>>
>> the only thing i recognized is, that at the same time there are often
>> flows with completely wrong date (about 6 weeks in the future).
>> not exactly the same time, of course. when softflowd crashes, the
>> possibly
>> critical data is lost.
>>
>> so my questions are:
>> 1) where does softflowd get its time from?
>> 2) can the wrong time be a problem?
>> 3) what else could cause the crashes
>>
>> especially: how can i find it out.
>> softflowd is _very_ quiet. nothing in the syslog, no message at all.
>
> Try running it manually in debug mode:
>
> nohup flowd -dg &
>
> and see what is in nohup.out if/when it crashes.
>
> -d
>

Hi Damien,
Thank you for the hint.
I don't have a problem with flowd but with _softflowd_. ;)
Anyway.
I started softflowd with:
nohup /usr/local/sbin/softflowd -D -i eth0 -T proto -v 5 -t maxlife=1m -n
127.0.0.1:4711 &

"-D" for softflowd's debugging mode.
It's really chatty then.
And now I wait and watch the nohup.out file growing.

Funny detail: after about 50 days of crashing irregularly, sometimes three
times a day, softflowd runs stable on that host for two weeks now.

xela


From xela at mailinglist.at  Mon Apr 13 01:03:19 2009
From: xela at mailinglist.at (alex k)
Date: Sun, 12 Apr 2009 17:03:19 +0200 (CEST)
Subject: [netflow-tools] softflowd keeps crashing
In-Reply-To: <alpine.BSO.2.00.0904091658050.8718@fuyu.mindrot.org>
References: <fcc3d0724c787ce66aa853cdc4314f91.squirrel@www.mailinglist.at>
	<alpine.BSO.2.00.0904091658050.8718@fuyu.mindrot.org>
Message-ID: <d3e57a83dd68b4fbcf90029203a10b74.squirrel@www.mailinglist.at>

> On Fri, 27 Mar 2009, alex k wrote:
>
>> hi there,
>>
>> first of all, softflowd is a cool piece of software.
>> we have it on other linux machines (gateways) and it runs perfectly
>> stable
>> there.
>> i use softflowd to collect data and nfsen to capture and evaluate.
>>
>> but there is one host, where softflowd keeps crashing.
>> i am a bit clueless as instability doesn't seem to be a problem of
>> softflowd.
>> at least i didn't find anything in the web or this list.
>>
>> some information about the host:
>>
>> kernel (64 bit):
>> 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64 x86_64 x86_64
>> GNU/Linux
>> libpcap version:
>> libpcap0-0.9.8-47.41
>> softflowd version:
>> softflowd-0.9.8 (compiled without problems on that machine)
>>
>> on this host with one network card runs vmware-server with several
>> guests.
>> the guests use bridged networking, every has its own ip address, but as
>> mentioned - there is only one network card.
>>
>> softflowd crashes occasionally. sometimes once in two weeks, sometimes
>> twice a day.
>> the process disappears, the pid file stays.
>>
>> the only thing i recognized is, that at the same time there are often
>> flows with completely wrong date (about 6 weeks in the future).
>> not exactly the same time, of course. when softflowd crashes, the
>> possibly
>> critical data is lost.
>>
>> so my questions are:
>> 1) where does softflowd get its time from?
>> 2) can the wrong time be a problem?
>> 3) what else could cause the crashes
>>
>> especially: how can i find it out.
>> softflowd is _very_ quiet. nothing in the syslog, no message at all.
>
> Try running it manually in debug mode:
>
> nohup flowd -dg &
>
> and see what is in nohup.out if/when it crashes.
>
> -d
>

Hi Damien,

Softflowd crashed again.

In the nohup.out file I found the following:

Shutting down after pcap EOF
Shutting down on user request
Starting expiry scan: mode -1
...then a lot of...
Queuing flow seq:102338 ... for expiry reason 6 (or 5, 1, 4)
...then a lot of...
EXPIRED: seq:...

With nfdump I found again (at that time) entries with date about six weeks
in the future.
But no wrong date in nohup.out.

What happened? Network error? Corrupted file? Socket problem?

The host gets monitored an was reachable (pingable) at that time.
The nohup.out file is 40MB big and contains a lot of IP addresses.
That's why I hesitate to send it to you and especially the list.

Any idea, what I can try next?

xela


From djm at mindrot.org  Mon Apr 13 09:14:52 2009
From: djm at mindrot.org (Damien Miller)
Date: Mon, 13 Apr 2009 09:14:52 +1000 (EST)
Subject: [netflow-tools] softflowd keeps crashing
In-Reply-To: <d3e57a83dd68b4fbcf90029203a10b74.squirrel@www.mailinglist.at>
References: <fcc3d0724c787ce66aa853cdc4314f91.squirrel@www.mailinglist.at>
	<alpine.BSO.2.00.0904091658050.8718@fuyu.mindrot.org>
	<d3e57a83dd68b4fbcf90029203a10b74.squirrel@www.mailinglist.at>
Message-ID: <alpine.BSO.2.00.0904130911570.13424@fuyu.mindrot.org>

On Sun, 12 Apr 2009, alex k wrote:

> Hi Damien,
> 
> Softflowd crashed again.
> 
> In the nohup.out file I found the following:
> 
> Shutting down after pcap EOF
> Shutting down on user request
> Starting expiry scan: mode -1
> ...then a lot of...
> Queuing flow seq:102338 ... for expiry reason 6 (or 5, 1, 4)
> ...then a lot of...
> EXPIRED: seq:...
> 
> With nfdump I found again (at that time) entries with date about six weeks
> in the future.
> But no wrong date in nohup.out.

Are all the flows incorrectly dated, or just the ones from around the time
softflowd exited?

> What happened? Network error? Corrupted file? Socket problem?

Is anything restarting (bringing down and back up) the network interface
on which softflowd is listening? That can cause this sort of problem.
This line:

> Shutting down after pcap EOF

Indicates that libpcap has closed itself.

-d

From xela at mailinglist.at  Mon Apr 13 20:14:45 2009
From: xela at mailinglist.at (alex k)
Date: Mon, 13 Apr 2009 12:14:45 +0200 (CEST)
Subject: [netflow-tools] softflowd keeps crashing
In-Reply-To: <alpine.BSO.2.00.0904130911570.13424@fuyu.mindrot.org>
References: <fcc3d0724c787ce66aa853cdc4314f91.squirrel@www.mailinglist.at>
	<alpine.BSO.2.00.0904091658050.8718@fuyu.mindrot.org>
	<d3e57a83dd68b4fbcf90029203a10b74.squirrel@www.mailinglist.at>
	<alpine.BSO.2.00.0904130911570.13424@fuyu.mindrot.org>
Message-ID: <f860aba1362f6257553d7a85465e2bfa.squirrel@www.mailinglist.at>

Hi,

> On Sun, 12 Apr 2009, alex k wrote:
>
>> Hi Damien,
>>
>> Softflowd crashed again.
>>
>> In the nohup.out file I found the following:
>>
>> Shutting down after pcap EOF
>> Shutting down on user request
>> Starting expiry scan: mode -1
>> ...then a lot of...
>> Queuing flow seq:102338 ... for expiry reason 6 (or 5, 1, 4)
>> ...then a lot of...
>> EXPIRED: seq:...
>>
>> With nfdump I found again (at that time) entries with date about six
>> weeks
>> in the future.
>> But no wrong date in nohup.out.
>
> Are all the flows incorrectly dated, or just the ones from around the time
> softflowd exited?
>

It seems to me, that the first one or two flows after the crash (softflowd
gets started automatically) are the wrong dated ones.
It crashed at 00:04 and was started a few seconds after that (I found a
very fast way to do that).

nfdump -M /usr/local/nfsen/profiles-data/live/eth0  -T  -r
2009/04/11/nfcapd.200904110005 -o long -c 1000
nfdump filter: any

Date flow start          Duration Proto     ...
2009-05-30 17:07:11.487    60.820 TCP      ...
2009-05-30 17:07:11.487    60.820 TCP     ...
2009-04-11 00:04:24.256    60.735 TCP      ...
2009-04-11 00:04:24.256    60.735 TCP   ...

It doesn't have to do anything with the crash itself but is the only
abnormal thing I can see.

>> What happened? Network error? Corrupted file? Socket problem?
>
> Is anything restarting (bringing down and back up) the network interface
> on which softflowd is listening? That can cause this sort of problem.
> This line:
>
>> Shutting down after pcap EOF
>
> Indicates that libpcap has closed itself.

As far as I can see, the network interface had no problem at that time.
The host is monitored and was never unreachable.
It could have been a problem with VMware. (The IP with the wrong dated
entries is a virtual machine.)

How can I find out, if it's a libpcap problem? It all happens in memory,
right?

xela


>
> -d
>


From djm at mindrot.org  Mon Apr 13 21:04:58 2009
From: djm at mindrot.org (Damien Miller)
Date: Mon, 13 Apr 2009 21:04:58 +1000 (EST)
Subject: [netflow-tools] softflowd keeps crashing
In-Reply-To: <f860aba1362f6257553d7a85465e2bfa.squirrel@www.mailinglist.at>
References: <fcc3d0724c787ce66aa853cdc4314f91.squirrel@www.mailinglist.at>
	<alpine.BSO.2.00.0904091658050.8718@fuyu.mindrot.org>
	<d3e57a83dd68b4fbcf90029203a10b74.squirrel@www.mailinglist.at>
	<alpine.BSO.2.00.0904130911570.13424@fuyu.mindrot.org>
	<f860aba1362f6257553d7a85465e2bfa.squirrel@www.mailinglist.at>
Message-ID: <alpine.BSO.2.00.0904132049090.2999@fuyu.mindrot.org>

On Mon, 13 Apr 2009, alex k wrote:

> > Are all the flows incorrectly dated, or just the ones from around the time
> > softflowd exited?
> >
> 
> It seems to me, that the first one or two flows after the crash (softflowd
> gets started automatically) are the wrong dated ones.
> It crashed at 00:04 and was started a few seconds after that (I found a
> very fast way to do that).

Just to be clear: it is the first flows out of softflowd after a restart
and not the last couple before a crash that have invalid times? Are both
the start time and the end time incorrect?

Could you try to find the details of this flow in the softflowd debug log
and see if the times are incorrect there too? The flow start time comes from
libpcap, so it is possible that it is giving us bad data.

> >> What happened? Network error? Corrupted file? Socket problem?
> >
> > Is anything restarting (bringing down and back up) the network interface
> > on which softflowd is listening? That can cause this sort of problem.
> > This line:
> >
> >> Shutting down after pcap EOF
> >
> > Indicates that libpcap has closed itself.
> 
> As far as I can see, the network interface had no problem at that time.
> The host is monitored and was never unreachable.
> It could have been a problem with VMware. (The IP with the wrong dated
> entries is a virtual machine.)
> 
> How can I find out, if it's a libpcap problem? It all happens in memory,
> right?

Are you running softflowd with a pcap filter on the commandline?

You might also want to try this diff:

Index: softflowd.c
===================================================================
RCS file: /var/cvs/softflowd/softflowd.c,v
retrieving revision 1.98
diff -u -p -r1.98 softflowd.c
--- softflowd.c	3 Sep 2007 10:50:05 -0000	1.98
+++ softflowd.c	13 Apr 2009 11:04:10 -0000
@@ -1916,7 +1916,7 @@ main(int argc, char **argv)
 				logit(LOG_ERR, "Exiting on pcap_dispatch: %s", 
 				    pcap_geterr(pcap));
 				break;
-			} else if (r == 0) {
+			} else if (r == 0 && capfile != NULL) {
 				logit(LOG_NOTICE, "Shutting down after "
 				    "pcap EOF");
 				graceful_shutdown_request = 1;

From xela at mailinglist.at  Mon Apr 13 22:20:57 2009
From: xela at mailinglist.at (alex k)
Date: Mon, 13 Apr 2009 14:20:57 +0200 (CEST)
Subject: [netflow-tools] softflowd keeps crashing
In-Reply-To: <alpine.BSO.2.00.0904132049090.2999@fuyu.mindrot.org>
References: <fcc3d0724c787ce66aa853cdc4314f91.squirrel@www.mailinglist.at>
	<alpine.BSO.2.00.0904091658050.8718@fuyu.mindrot.org>
	<d3e57a83dd68b4fbcf90029203a10b74.squirrel@www.mailinglist.at>
	<alpine.BSO.2.00.0904130911570.13424@fuyu.mindrot.org>
	<f860aba1362f6257553d7a85465e2bfa.squirrel@www.mailinglist.at>
	<alpine.BSO.2.00.0904132049090.2999@fuyu.mindrot.org>
Message-ID: <8a6155faaf6eb8f9510494f59410cbd2.squirrel@www.mailinglist.at>

Hi,

> On Mon, 13 Apr 2009, alex k wrote:
>
>> > Are all the flows incorrectly dated, or just the ones from around the
>> time
>> > softflowd exited?
>> >
>>
>> It seems to me, that the first one or two flows after the crash
>> (softflowd
>> gets started automatically) are the wrong dated ones.
>> It crashed at 00:04 and was started a few seconds after that (I found a
>> very fast way to do that).
>
> Just to be clear: it is the first flows out of softflowd after a restart
> and not the last couple before a crash that have invalid times? Are both
> the start time and the end time incorrect?
>

Yes, the first flows AFTER the restart are the suspicious ones.
Softflowd was (re)started with my normal init script, so it didn't start
in debug mode and I have nothing to compare the output of nfdump with.

All the dates in nohup.out file (=before the crash, even after the
shutdown log entry) are correct.

> Could you try to find the details of this flow in the softflowd debug log
> and see if the times are incorrect there too? The flow start time comes
> from
> libpcap, so it is possible that it is giving us bad data.
>
>> >> What happened? Network error? Corrupted file? Socket problem?
>> >
>> > Is anything restarting (bringing down and back up) the network
>> interface
>> > on which softflowd is listening? That can cause this sort of problem.
>> > This line:
>> >
>> >> Shutting down after pcap EOF
>> >
>> > Indicates that libpcap has closed itself.
>>
>> As far as I can see, the network interface had no problem at that time.
>> The host is monitored and was never unreachable.
>> It could have been a problem with VMware. (The IP with the wrong dated
>> entries is a virtual machine.)
>>
>> How can I find out, if it's a libpcap problem? It all happens in memory,
>> right?
>
> Are you running softflowd with a pcap filter on the commandline?

No, I don't even know how to do that. ;)

The commandline is:
softflowd -i eth0 -T proto -v 5 -t maxlife=1m -n 127.0.0.1:4711

>
> You might also want to try this diff:
>
> Index: softflowd.c
> ===================================================================
> RCS file: /var/cvs/softflowd/softflowd.c,v
> retrieving revision 1.98
> diff -u -p -r1.98 softflowd.c
> --- softflowd.c	3 Sep 2007 10:50:05 -0000	1.98
> +++ softflowd.c	13 Apr 2009 11:04:10 -0000
> @@ -1916,7 +1916,7 @@ main(int argc, char **argv)
>  				logit(LOG_ERR, "Exiting on pcap_dispatch: %s",
>  				    pcap_geterr(pcap));
>  				break;
> -			} else if (r == 0) {
> +			} else if (r == 0 && capfile != NULL) {
>  				logit(LOG_NOTICE, "Shutting down after "
>  				    "pcap EOF");
>  				graceful_shutdown_request = 1;
>

O.K. I patched and recompiled and started softflowd with nohup.
We might have to wait for a week now. ;)

xela


From smajko at wp.pl  Wed Apr 22 21:51:52 2009
From: smajko at wp.pl (Sebastian Majkowski)
Date: Wed, 22 Apr 2009 13:51:52 +0200
Subject: [netflow-tools] softflowd performance and time in netflow
Message-ID: <49EF04D8.3060201@wp.pl>

Hi

First of all - thanks for very nice tool, keep up the good work!
I have few questions about your software:

1) what is the maximum performance for softflowd - I need to have 
netflow from around 300Mbits per second, is it possible to have it? 
(what hardware needed?)
2) my current flowd collects for me:
FLOW src [192.168.2.161]:22117 dst [60.28.197.35]:28225
FLOW src [60.28.197.35]:0 dst [192.168.2.161]:771
FLOW src [74.125.79.102]:80 dst [192.168.2.128]:4327
FLOW src [192.168.2.128]:4327 dst [74.125.79.102]:80
FLOW src [192.168.2.248]:137 dst [192.168.3.255]:137
FLOW src [60.28.197.35]:28221 dst [192.168.2.161]:4407
FLOW src [192.168.2.161]:4407 dst [60.28.197.35]:28221
FLOW src [192.168.2.240]:138 dst [192.168.3.255]:138
FLOW src [192.168.2.194]:138 dst [192.168.3.255]:138
FLOW src [192.168.2.161]:22117 dst [80.239.200.111]:3004
FLOW src [192.168.2.149]:137 dst [192.168.3.255]:137
FLOW src [192.168.2.161]:22117 dst [219.232.241.91]:57
FLOW src [219.232.241.91]:57 dst [192.168.2.161]:22117
FLOW src [60.28.197.35]:28221 dst [192.168.2.161]:4431
FLOW src [192.168.2.161]:4431 dst [60.28.197.35]:28221
FLOW src [192.168.3.215]:1778 dst [192.168.3.255]:137

is it possible to have also exact time for each flow??


Regards,


Sebastian


From xela at mailinglist.at  Sat Apr 25 02:13:50 2009
From: xela at mailinglist.at (alex k)
Date: Fri, 24 Apr 2009 18:13:50 +0200 (CEST)
Subject: [netflow-tools] softflowd keeps crashing
In-Reply-To: <alpine.BSO.2.00.0904132049090.2999@fuyu.mindrot.org>
References: <fcc3d0724c787ce66aa853cdc4314f91.squirrel@www.mailinglist.at>
	<alpine.BSO.2.00.0904091658050.8718@fuyu.mindrot.org>
	<d3e57a83dd68b4fbcf90029203a10b74.squirrel@www.mailinglist.at>
	<alpine.BSO.2.00.0904130911570.13424@fuyu.mindrot.org>
	<f860aba1362f6257553d7a85465e2bfa.squirrel@www.mailinglist.at>
	<alpine.BSO.2.00.0904132049090.2999@fuyu.mindrot.org>
Message-ID: <e5377b4f6fe172dfaf2fe84db4ff5327.squirrel@www.mailinglist.at>

> On Mon, 13 Apr 2009, alex k wrote:
>
>> > Are all the flows incorrectly dated, or just the ones from around the
>> time
>> > softflowd exited?
>> >
>>
>> It seems to me, that the first one or two flows after the crash
>> (softflowd
>> gets started automatically) are the wrong dated ones.
>> It crashed at 00:04 and was started a few seconds after that (I found a
>> very fast way to do that).
>
> Just to be clear: it is the first flows out of softflowd after a restart
> and not the last couple before a crash that have invalid times? Are both
> the start time and the end time incorrect?
>
> Could you try to find the details of this flow in the softflowd debug log
> and see if the times are incorrect there too? The flow start time comes
> from
> libpcap, so it is possible that it is giving us bad data.
>
>> >> What happened? Network error? Corrupted file? Socket problem?
>> >
>> > Is anything restarting (bringing down and back up) the network
>> interface
>> > on which softflowd is listening? That can cause this sort of problem.
>> > This line:
>> >
>> >> Shutting down after pcap EOF
>> >
>> > Indicates that libpcap has closed itself.
>>
>> As far as I can see, the network interface had no problem at that time.
>> The host is monitored and was never unreachable.
>> It could have been a problem with VMware. (The IP with the wrong dated
>> entries is a virtual machine.)
>>
>> How can I find out, if it's a libpcap problem? It all happens in memory,
>> right?
>
> Are you running softflowd with a pcap filter on the commandline?
>
> You might also want to try this diff:
>
> Index: softflowd.c
> ===================================================================
> RCS file: /var/cvs/softflowd/softflowd.c,v
> retrieving revision 1.98
> diff -u -p -r1.98 softflowd.c
> --- softflowd.c	3 Sep 2007 10:50:05 -0000	1.98
> +++ softflowd.c	13 Apr 2009 11:04:10 -0000
> @@ -1916,7 +1916,7 @@ main(int argc, char **argv)
>  				logit(LOG_ERR, "Exiting on pcap_dispatch: %s",
>  				    pcap_geterr(pcap));
>  				break;
> -			} else if (r == 0) {
> +			} else if (r == 0 && capfile != NULL) {
>  				logit(LOG_NOTICE, "Shutting down after "
>  				    "pcap EOF");
>  				graceful_shutdown_request = 1;
>

Hi Damien,

Something _really_ ugly happened. I was very busy this week, so I placed
reliance on my monitoring. Bad idea... softflowd didn't crash, the process
stayed, but it stopped to capture packets and to expire flows last sunday.
Absolutly nothing in the nohup.out file (no "Shutting down..." message).

What happened? "capfile" was NULL?

Before I finally killed the leftover (and useless) softflowd process, I
tried "softflowctl statistics". It gave me:

softflowd[26288]: Accumulated statistics:
Number of active flows: 0
Packets processed: 53254480
Fragments: 280
Ignored packets: 198188 (198188 non-IP, 0 too short)
Flows expired: 434564 (0 forced)
Flows exported: 841806 in 132789 packets (0 failures)
Packets received by libpcap: 109063926
Packets dropped by libpcap: 55601898
Packets dropped by interface: 0

Expired flow statistics:  minimum       average       maximum
  Flow bytes:                  40         64771     104883460
  Flow packets:                 1           123        105328
  Duration:                  0.00s        47.66s        77.08s

Expired flow reasons:
       tcp =    145544   tcp.rst =         0   tcp.fin =         0
       udp =      4239      icmp =     36829   general =         0
   maxlife =    247952
  over 2Gb =         0
  maxflows =         0
   flushed =         0

Per-protocol statistics:     Octets      Packets   Avg Life    Max Life
           icmp (1):      263824161      2779572      57.46s      74.46s
            tcp (6):    27819369585     49610926      39.47s      77.08s
           udp (17):       64032137       863982      44.32s      72.45s

It seems, everything was fine, then something happened with "capfile" and
softflowd couldn't proceed. No more active flows.

What could we try next?

xela