From thomas at ronner.org Mon Feb 1 05:51:00 2010 From: thomas at ronner.org (Thomas Ronner) Date: Sun, 31 Jan 2010 19:51:00 +0100 Subject: [netflow-tools] Python bindings on Linux slow Message-ID: <80BE5B38-8B62-4381-914F-FB568E306467@ronner.org> Hello, I've run into the problem that the Python bindings are quite slow on Linux, using massive amounts of CPU time, all 'system' time according to top. On my FreeBSD system I don't see this behaviour. When I strace a simple program[1] that iterates through a flow file I see lots of PF_NETLINK sockets being opened, sendto()'d, recvmsg()'d and closed. See below for an excerpt of this output[2] and a summary from 'strace - c'[3]. The problem seems to be that glibc on Linux does this during the getaddrinfo() function. This function is called indirectly from the flowobj_normalise() function in flowd_python.c. As this is called through addr_pton() in addr.c there are probably more places in the flowd code where this problem occurs. I'm unable to find a quick workaround for this, other than copying parts of getaddrinfo() from FreeBSD and incorporating that into the python module. But as I'm not a C programmer it will take weeks if not months to really understand all the smart little tricks in getaddrinfo(), recognizing what I need, and implementing a non- crashing version of it... Does somebody have a better idea? Regards, Thomas Footnotes: [1] Simple python program: from flowd import FlowLog flowlog = FlowLog("flowd") f = file ('simple.out', 'w') for flow in flowlog: f.write( flow.format() ) f.write('\n') f.close() [2] strace output: (...) open("flowd.klein", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0600, st_size=11600, ...}) = 0 read(3, "`\33\0\0@\7\372\256K_##\0\0064\335\0\6\0\0\177\0\0\1\n\0031dF/ r\337\0"..., 8192) = 8192 socket(PF_NETLINK, SOCK_RAW, 0) = 4 bind(4, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0 getsockname(4, {sa_family=AF_NETLINK, pid=7674, groups=00000000}, [157869786481557516]) = 0 time(NULL) = 1264852034 sendto(4, "\24\0\0\0\26\0\1\3B\34dK\0\0\0\0\0\0\0\0"..., 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20 recvmsg(4, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"0\0\0\0\24\0\2\0B\34dK \372\35\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1\10"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 108 recvmsg(4, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"@\0\0\0\24\0\2\0B\34dK\372\35\0\0\n \200\200\376\1\0\0\0\24\0\1\0\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 128 recvmsg(4, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0B\34dK \372\35\0\0\0\0\0\0\1\0\0\0\24\0\1\0\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 20 close(4) = 0 (...) [3] strace -c output; don't look at the timings, they are wrong. This process ran for a couple of minutes: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 34.78 0.330054 0 10770888 recvmsg 14.46 0.137189 0 3590338 close 13.25 0.125723 0 3590296 socket 13.01 0.123506 0 3590296 sendto 7.86 0.074605 0 3590296 bind 6.22 0.059050 0 3590296 getsockname 5.14 0.048759 0 3590296 time 5.11 0.048531 0 1795219 43 stat 0.09 0.000846 0 12768 read 0.08 0.000733 0 34991 write 0.00 0.000000 0 137 95 open 0.00 0.000000 0 64 fstat 0.00 0.000000 0 1 lstat 0.00 0.000000 0 3 lseek 0.00 0.000000 0 41 mmap 0.00 0.000000 0 11 mprotect 0.00 0.000000 0 18 munmap 0.00 0.000000 0 6 brk 0.00 0.000000 0 68 rt_sigaction 0.00 0.000000 0 1 rt_sigprocmask 0.00 0.000000 0 6 3 ioctl 0.00 0.000000 0 2 2 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 1 fcntl 0.00 0.000000 0 4 getdents 0.00 0.000000 0 1 getcwd 0.00 0.000000 0 3 2 readlink 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 2 futex 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.948996 34156058 145 total