tcpdump mailing list archives
Re: Libpcap on VMWare
From: Guy Harris <guy () alum mit edu>
Date: Tue, 12 Jan 2010 14:54:51 -0800
On Jan 12, 2010, at 1:38 AM, Vikram Roopchand wrote:
This is similar in nature to http://article.gmane.org/gmane.network.tcpdump.devel/4256 posting (which is unfortunately unsolved). We are using jnetpcap which is a wrapper over libpcap. Mark Bednarczyk posted the original query (4256). -------------------------------------- We are experiencing massive packet drops in libpcap while working with Non Windows guests on VMWare ESXi Server . The same thing happens on VMplayer (Host OS - Windows). We have tested on Ubuntu 8.04, FC11 and Debian , the library seems to drop packets every where. The load being subjected to is not much but is constant (TCP packets of 1200 - 1500 bytes consistently). The packet drops DO NOT occur on Windows Guest OSs (both via ESXi and VMPlayer). They only happen when we are working with non-Windows guests.
Do they happen if you're running with Linux on bare hardware, rather than under VMware? I.e., is there any reason to believe that this is a problem with libpcap on VMware, rather than, for example, libpcap on Linux?
Libpcap version from Ubuntu:- Libpcap (by dpkg) : ii libpcap0.8 0.9.8-2 System interface for user-level packet capture.
That means you're using a version of libpcap based on the 0.9.8 release. The package
As a temporary measure, we initially thought we could need to increase the socket receive buffer size as someone did here http://www.winpcap.org/pipermail/winpcap-users/2006-October/001521.html . We tried configuration given in the link and it reduced packet drops substantially. To about 2% from over 20% earlier but still not to zero. Being new to Libpcap (and Linux) , we are still struggling with some basic understanding and would be grateful if someone could set us on track. 1. What we did with these commands sysctl -w net.core.rmem_max=4194304 sysctl -w net.core.rmem_default=4194304 was to increase the Linux socket size so that when libpcap opens a socket to the BPF device
There are no BPF devices on Linux. libpcap opens a PF_PACKET socket and later binds it to a *networking* device.
it uses this size (of 4M here). Is this understanding correct?
From a quick look at the Linux 2.6.29 kernel, rmem_default will be used as the default receive buffer size when any socket is created; this includes PF_PACKET sockets, as well as PF_INET sockets, and....
2. In the libpcap source pcap "pcap-bpf.c" ,
As I said, there are no BPF devices in Linux, so pcap-bpf.c is irrelevant to Linux. The pcap-*.c file relevant to Linux is called... ...pcap-linux.c.
at line 1618 (from http://github.com/mcr/libpcap/blob/117cb5eb2eb4fe212d3851f1205bb0b8f57873c6/pcap-bpf.c) , it says "We don't have a zero copy BPF, set the buffer size" . May I know what this means ?
"We don't have a zero copy BPF" means you're running on one of: FreeBSD prior to 8.0; NetBSD; OpenBSD; DragonFly BSD; Mac OS X; AIX; or you're running on FreeBSD 8.0 or later but the attempt to turn on zero-copy mode failed. Note that, as per the previous paragraph, that list does *NOT* include Linux. The fact that that comment is there *at all* means that you're using libpcap 1.0.0 or later. At least on Ubuntu, you're using libpcap 0.9.8, as per the "Libpcap version from Ubuntu". Run "tcpdump -h" to find out what version of libpcap you're using on any particular machine.
What does this buffer size mentioned in the comment represent ? Does Libpcap have it's own buffer other than the Socket buffer ?
On the OSes in the list above, there is *NO* socket involved; instead, there's a BPF device. The BPF device has its own buffer; the size of that buffer is the buffer size being set. (Libpcap also may have its own buffer, into which packets are read from the kernel; that read would be done from: a PF_PACKET socket on Linux - unless you're capturing from, for example, a USB interface with libpcap 1.0.0 or later; a BPF device on the OSes listed above; a DLPI device in Solaris, HP-UX, or some other UN*Xes; etc..) On Linux, however, there *is* a socket.
And on the subsequent lines it says /* * No buffer size was explicitly specified. * * Try finding a good size for the buffer; * DEFAULT_BUFSIZE may be too big, so keep * cutting it in half until we find a size * that works, or run out of sizes to try. * If the default is larger, don't make it smaller. */ DEFAULT_BUFSIZE is 512K.
As indicated, that code is completely irrelevant to Linux, as it's not used on Linux.
... When we used these commands sysctl -w net.core.rmem_max=4194304 and sysctl -w net.core.rmem_default=4194304. What is it that we did ?
You set the default size for socket receive buffers on Linux (as well as the maximum socket receive buffer size).
Does libpcap have its own buffer where it copies packet frames from Linux Socket ?
Yes, but...
If so , how do we configure it from outside so that we can increase it's size also ?
...it's irrelevant to the problem you're having. The problem is probably that libpcap, and your program, aren't reading packets fast enough, so, given that the socket buffer has a finite size, that buffer can eventually fill up, at which point any more packets that arrive will be dropped. Making the socket buffer bigger will help there *IF* the program+libpcap is capable, on average, of reading and processing packets as fast as, or faster than, they arrive - the buffer only helps if the inability to process packets at full speed is temporary (program gets temporarily slowed down by, for example, having to write the packets to a file, or a short burst of packets arrives too fast) and the program can later catch up. The buffer in libpcap only has to be big enough for the chunk of packets libpcap reads - and, in versions of libpcap prior to 1.0.0, it does a recvfrom() on a PF_PACKET socket, and gets one packet at a time, so the buffer in libpcap only needs to be big enough for one packet.
We got this link http://public.lanl.gov/cpw/README.ring.html which talks about various environment variables (PCAP_FRAMES to be precise) that can be used to configure libpcap but I am not sure if this gentleman compiled his own libpcap version or this is applicable to standard distro as well.
It's his own version, so those environment variables don't apply to the standard version. *HOWEVER*, the main thing that his version of libpcap does is support Linux's zero-copy (memory-mapped) capture mechanism. Using that mechanism (or the zero-copy mechanism in FreeBSD 8.0 and later) means that there is a buffer that's in both the kernel's address space and the application's address space, so that data doesn't need to be copied from a kernel-mode buffer to a user-mode buffer. Packets *are* still copied from the skbuff (Linux) or mbuf (FreeBSD) into the shared buffer, so it's really more like "one-copy", but that's still one fewer copy, so that could reduce the CPU time required to receive captured packets. In addition, on Linux, that means that, at least in theory, when the application wakes up as packets arrive, it might be able to receive more than one packet per wakeup - libpcap will take packets from the shared buffer as long as there are packets available. Processing more than one packet per wakeup can also speed up packet processing, so that the application might drop fewer packets. (With BPF - except on AIX - even *without* the zero-copy capture mechanism, more than one packet can be delivered per wakeup, so, whilst the zero-copy mechanism in FreeBSD 8.0 and later will avoid one copy, it shouldn't increase the number of packets delivered per wakeup. In addition, the capture mechanism WinPcap provides on Windows also delivers more than one packet per wakeup.) Libpcap 1.0.0 and later also support Linux's (and FreeBSD 8.0 and later's) zero-copy capture mechanism, so if you were using libpcap 1.0.0 or later, rather than libpcap 0.9.6, you might drop fewer packets. (As per Dustin Spicuzza's e-mail, "later" is better than "1.0.0"; "later" currently means "top of Git tree".)
May we also know what is this ring buffer people keep talking about ?
There's the ring buffer provided by newer versions of the standard Linux kernel; that's what Phil Wood is referring to in the link you mention above. There's also Luca Deri's PF_RING: http://www.ntop.org/PF_RING.html which requires modifications to libpcap to use.
Does libpcap standard distro have a ring buffer (related to the question above) ?
Versions of libpcap before 1.0.0 don't support the Linux zero-copy capture mechanism; libpcap 1.0.0 and later do.
And can PCAP_MEMORY or PCAP_FRAMES environment variable help increase it (as in the link above and here http://seclists.org/snort/2009/q1/209) ?
Only Phil Wood's libpcap supports those environment variables. However, libpcap 1.0.0 and later have an API that lets an application set the buffer size, on platforms where the buffer size can be set; tcpdump 4.0.0 and later support that API with the "-B" flag. I don't know whether jnetpcap supports the new APIs yet, however.- This is the tcpdump-workers list. Visit https://cod.sandelman.ca/ to unsubscribe.
Current thread:
- Libpcap on VMWare Vikram Roopchand (Jan 12)
- Re: Libpcap on VMWare Vikram Roopchand (Jan 12)
- Re: Libpcap on VMWare Dustin Spicuzza (Jan 12)
- Re: Libpcap on VMWare Guy Harris (Jan 12)
- Re: Libpcap on VMWare Dustin Spicuzza (Jan 12)
- Re: Libpcap on VMWare Guy Harris (Jan 12)
- Re: Libpcap on VMWare Dustin Spicuzza (Jan 12)
- Re: Libpcap on VMWare Michael Richardson (Jan 13)
- Re: Libpcap on VMWare Guy Harris (Jan 12)
- Re: Libpcap on VMWare Mark Bednarczyk (Jan 12)
- Re: Libpcap on VMWare Guy Harris (Jan 12)
- Re: Libpcap on VMWare Vikram Roopchand (Jan 12)
- Re: Libpcap on VMWare Gert Doering (Jan 13)
- Re: Libpcap on VMWare Vikram Roopchand (Jan 13)
- Re: Libpcap on VMWare Vikram Roopchand (Jan 30)