[netflow-tools] Python bindings on Linux slow
Thomas Ronner
thomas at ronner.org
Mon Feb 1 05:51:00 EST 2010
Hello,
I've run into the problem that the Python bindings are quite slow on
Linux, using massive amounts of CPU time, all 'system' time according
to top. On my FreeBSD system I don't see this behaviour. When I strace
a simple program[1] that iterates through a flow file I see lots of
PF_NETLINK sockets being opened, sendto()'d, recvmsg()'d and closed.
See below for an excerpt of this output[2] and a summary from 'strace -
c'[3].
The problem seems to be that glibc on Linux does this during the
getaddrinfo() function. This function is called indirectly from the
flowobj_normalise() function in flowd_python.c. As this is called
through addr_pton() in addr.c there are probably more places in the
flowd code where this problem occurs.
I'm unable to find a quick workaround for this, other than copying
parts of getaddrinfo() from FreeBSD and incorporating that into the
python module. But as I'm not a C programmer it will take weeks if not
months to really understand all the smart little tricks in
getaddrinfo(), recognizing what I need, and implementing a non-
crashing version of it...
Does somebody have a better idea?
Regards,
Thomas
Footnotes:
[1] Simple python program:
from flowd import FlowLog
flowlog = FlowLog("flowd")
f = file ('simple.out', 'w')
for flow in flowlog:
f.write( flow.format() )
f.write('\n')
f.close()
[2] strace output:
(...)
open("flowd.klein", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0600, st_size=11600, ...}) = 0
read(3, "`\33\0\0@\7\372\256K_##\0\0064\335\0\6\0\0\177\0\0\1\n\0031dF/
r\337\0"..., 8192) = 8192
socket(PF_NETLINK, SOCK_RAW, 0) = 4
bind(4, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
getsockname(4, {sa_family=AF_NETLINK, pid=7674, groups=00000000},
[157869786481557516]) = 0
time(NULL) = 1264852034
sendto(4, "\24\0\0\0\26\0\1\3B\34dK\0\0\0\0\0\0\0\0"..., 20, 0,
{sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
recvmsg(4, {msg_name(12)={sa_family=AF_NETLINK, pid=0,
groups=00000000}, msg_iov(1)=[{"0\0\0\0\24\0\2\0B\34dK
\372\35\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1\10"..., 4096}],
msg_controllen=0, msg_flags=0}, 0) = 108
recvmsg(4, {msg_name(12)={sa_family=AF_NETLINK, pid=0,
groups=00000000}, msg_iov(1)=[{"@\0\0\0\24\0\2\0B\34dK\372\35\0\0\n
\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0\0"..., 4096}], msg_controllen=0,
msg_flags=0}, 0) = 128
recvmsg(4, {msg_name(12)={sa_family=AF_NETLINK, pid=0,
groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0B\34dK
\372\35\0\0\0\0\0\0\1\0\0\0\24\0\1\0\0\0\0\0\0"..., 4096}],
msg_controllen=0, msg_flags=0}, 0) = 20
close(4) = 0
(...)
[3] strace -c output; don't look at the timings, they are wrong. This
process ran for a couple of minutes:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
34.78 0.330054 0 10770888 recvmsg
14.46 0.137189 0 3590338 close
13.25 0.125723 0 3590296 socket
13.01 0.123506 0 3590296 sendto
7.86 0.074605 0 3590296 bind
6.22 0.059050 0 3590296 getsockname
5.14 0.048759 0 3590296 time
5.11 0.048531 0 1795219 43 stat
0.09 0.000846 0 12768 read
0.08 0.000733 0 34991 write
0.00 0.000000 0 137 95 open
0.00 0.000000 0 64 fstat
0.00 0.000000 0 1 lstat
0.00 0.000000 0 3 lseek
0.00 0.000000 0 41 mmap
0.00 0.000000 0 11 mprotect
0.00 0.000000 0 18 munmap
0.00 0.000000 0 6 brk
0.00 0.000000 0 68 rt_sigaction
0.00 0.000000 0 1 rt_sigprocmask
0.00 0.000000 0 6 3 ioctl
0.00 0.000000 0 2 2 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 fcntl
0.00 0.000000 0 4 getdents
0.00 0.000000 0 1 getcwd
0.00 0.000000 0 3 2 readlink
0.00 0.000000 0 1 getrlimit
0.00 0.000000 0 1 arch_prctl
0.00 0.000000 0 2 futex
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.948996 34156058 145 total
More information about the netflow-tools
mailing list