tcpdump mailing list archives

Re: Libpcap on VMWare

From: Guy Harris <guy () alum mit edu>
Date: Tue, 12 Jan 2010 14:54:51 -0800


On Jan 12, 2010, at 1:38 AM, Vikram Roopchand wrote:

               This is similar in nature to
http://article.gmane.org/gmane.network.tcpdump.devel/4256 posting (which is
unfortunately unsolved). We are using jnetpcap which is a wrapper over
libpcap. Mark Bednarczyk posted the original query (4256).

--------------------------------------

We are experiencing massive packet drops in libpcap while working with Non
Windows guests on VMWare ESXi Server . The same thing happens on VMplayer
(Host OS - Windows). We have tested on Ubuntu 8.04, FC11 and Debian , the
library seems to drop packets every where. The load being subjected to is
not much but is constant (TCP packets of 1200 - 1500 bytes consistently).

The packet drops DO NOT occur on Windows Guest OSs (both via ESXi and
VMPlayer). They only happen when we are working with non-Windows guests.


Do they happen if you're running with Linux on bare hardware, rather than under VMware?

I.e., is there any reason to believe that this is a problem with libpcap on VMware, rather than, for example, libpcap 
on Linux?

Libpcap version from Ubuntu:-

Libpcap (by dpkg) : ii  libpcap0.8     0.9.8-2        System interface for
user-level packet capture.


That means you're using a version of libpcap based on the 0.9.8 release.
The package

As a temporary measure, we initially thought we could need to increase the
socket receive buffer size as someone did here
http://www.winpcap.org/pipermail/winpcap-users/2006-October/001521.html .
We tried configuration given in the link and it reduced packet drops
substantially. To about 2% from over 20% earlier but still not to zero.

Being new to Libpcap (and Linux) , we are still struggling with some basic
understanding and would be grateful if someone could set us on track.

1. What we did with these commands

sysctl -w net.core.rmem_max=4194304
sysctl -w net.core.rmem_default=4194304

was to increase the Linux socket size so that when libpcap opens a socket to
the BPF device


There are no BPF devices on Linux.  libpcap opens a PF_PACKET socket and later binds it to a *networking* device.

it uses this size (of 4M here). Is this understanding correct?

From a quick look at the Linux 2.6.29 kernel, rmem_default will be used as the default receive buffer size when any 
socket is created; this includes PF_PACKET sockets, as well as PF_INET sockets, and....

2. In the libpcap source pcap "pcap-bpf.c" ,


As I said, there are no BPF devices in Linux, so pcap-bpf.c is irrelevant to Linux.  The pcap-*.c file relevant to 
Linux is called...

...pcap-linux.c.

at line 1618 (from
http://github.com/mcr/libpcap/blob/117cb5eb2eb4fe212d3851f1205bb0b8f57873c6/pcap-bpf.c)
, it says

"We don't have a zero copy BPF, set the buffer size" . May I know what this
means ?


"We don't have a zero copy BPF" means you're running on one of:

        FreeBSD prior to 8.0;
        NetBSD;
        OpenBSD;
        DragonFly BSD;
        Mac OS X;
        AIX;

or you're running on FreeBSD 8.0 or later but the attempt to turn on zero-copy mode failed.

Note that, as per the previous paragraph, that list does *NOT* include Linux.

The fact that that comment is there *at all* means that you're using libpcap 1.0.0 or later.  At least on Ubuntu, 
you're using libpcap 0.9.8, as per the "Libpcap version from Ubuntu".  Run "tcpdump -h" to find out what version of 
libpcap you're using on any particular machine.

What does this buffer size mentioned in the comment represent ? Does
Libpcap have it's own buffer other than the Socket buffer ?


On the OSes in the list above, there is *NO* socket involved; instead, there's a BPF device.  The BPF device has its 
own buffer; the size of that buffer is the buffer size being set.

(Libpcap also may have its own buffer, into which packets are read from the kernel; that read would be done from:

        a PF_PACKET socket on Linux - unless you're capturing from, for example, a USB interface with libpcap 1.0.0 or 
later;

        a BPF device on the OSes listed above;

        a DLPI device in Solaris, HP-UX, or some other UN*Xes;

        etc..)

On Linux, however, there *is* a socket.

And on the
subsequent lines it says

/*
* No buffer size was explicitly specified.
*
* Try finding a good size for the buffer;
* DEFAULT_BUFSIZE may be too big, so keep
* cutting it in half until we find a size
* that works, or run out of sizes to try.
* If the default is larger, don't make it smaller.
*/

DEFAULT_BUFSIZE is 512K.



As indicated, that code is completely irrelevant to Linux, as it's not used on Linux.

...  When we used these commands sysctl -w
net.core.rmem_max=4194304 and sysctl -w net.core.rmem_default=4194304. What
is it that we did ?


You set the default size for socket receive buffers on Linux (as well as the maximum socket receive buffer size).

Does libpcap have its own buffer where it copies packet
frames from Linux Socket ?


Yes, but...

If so , how do we configure it from outside so
that we can increase it's size also ?


...it's irrelevant to the problem you're having.  The problem is probably that libpcap, and your program, aren't 
reading packets fast enough, so, given that the socket buffer has a finite size, that buffer can eventually fill up, at 
which point any more packets that arrive will be dropped.  Making the socket buffer bigger will help there *IF* the 
program+libpcap is capable, on average, of reading and processing packets as fast as, or faster than, they arrive - the 
buffer only helps if the inability to process packets at full speed is temporary (program gets temporarily slowed down 
by, for example, having to write the packets to a file, or a short burst of packets arrives too fast) and the program 
can later catch up.

The buffer in libpcap only has to be big enough for the chunk of packets libpcap reads - and, in versions of libpcap 
prior to 1.0.0, it does a recvfrom() on a PF_PACKET socket, and gets one packet at a time, so the buffer in libpcap 
only needs to be big enough for one packet.

We got this link
http://public.lanl.gov/cpw/README.ring.html which talks about various
environment variables (PCAP_FRAMES to be precise) that can be used to
configure libpcap but I am not sure if this gentleman compiled his own
libpcap version or this is applicable to standard distro as well.


It's his own version, so those environment variables don't apply to the standard version.

*HOWEVER*, the main thing that his version of libpcap does is support Linux's zero-copy (memory-mapped) capture 
mechanism.  Using that mechanism (or the zero-copy mechanism in FreeBSD 8.0 and later) means that there is a buffer 
that's in both the kernel's address space and the application's address space, so that data doesn't need to be copied 
from a kernel-mode buffer to a user-mode buffer.  Packets *are* still copied from the skbuff (Linux) or mbuf (FreeBSD) 
into the shared buffer, so it's really more like "one-copy", but that's still one fewer copy, so that could reduce the 
CPU time required to receive captured packets.

In addition, on Linux, that means that, at least in theory, when the application wakes up as packets arrive, it might 
be able to receive more than one packet per wakeup - libpcap will take packets from the shared buffer as long as there 
are packets available.  Processing more than one packet per wakeup can also speed up packet processing, so that the 
application might drop fewer packets.  (With BPF - except on AIX - even *without* the zero-copy capture mechanism, more 
than one packet can be delivered per wakeup, so, whilst the zero-copy mechanism in FreeBSD 8.0 and later will avoid one 
copy, it shouldn't increase the number of packets delivered per wakeup.  In addition, the capture mechanism WinPcap 
provides on Windows also delivers more than one packet per wakeup.)

Libpcap 1.0.0 and later also support Linux's (and FreeBSD 8.0 and later's) zero-copy capture mechanism, so if you were 
using libpcap 1.0.0 or later, rather than libpcap 0.9.6, you might drop fewer packets.  (As per Dustin Spicuzza's 
e-mail, "later" is better than "1.0.0"; "later" currently means "top of Git tree".)

May we also know what is this ring buffer people keep talking about ?


There's the ring buffer provided by newer versions of the standard Linux kernel; that's what Phil Wood is referring to 
in the link you mention above.

There's also Luca Deri's PF_RING:

        http://www.ntop.org/PF_RING.html

which requires modifications to libpcap to use.

Does
libpcap standard distro have a ring buffer (related to the question above) ?


Versions of libpcap before 1.0.0 don't support the Linux zero-copy capture mechanism; libpcap 1.0.0 and later do.

And can PCAP_MEMORY or PCAP_FRAMES environment variable help increase it (as
in the link above and here http://seclists.org/snort/2009/q1/209) ?


Only Phil Wood's libpcap supports those environment variables.

However, libpcap 1.0.0 and later have an API that lets an application set the buffer size, on platforms where the 
buffer size can be set; tcpdump 4.0.0 and later support that API with the "-B" flag.  I don't know whether jnetpcap 
supports the new APIs yet, however.-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.

Current thread:

Libpcap on VMWare Vikram Roopchand (Jan 12)
- Re: Libpcap on VMWare Vikram Roopchand (Jan 12)
- Re: Libpcap on VMWare Dustin Spicuzza (Jan 12)
  - Re: Libpcap on VMWare Guy Harris (Jan 12)
    - Re: Libpcap on VMWare Dustin Spicuzza (Jan 12)
    - Re: Libpcap on VMWare Guy Harris (Jan 12)
    - Re: Libpcap on VMWare Dustin Spicuzza (Jan 12)
    - Re: Libpcap on VMWare Michael Richardson (Jan 13)
- Re: Libpcap on VMWare Guy Harris (Jan 12)
  - Re: Libpcap on VMWare Mark Bednarczyk (Jan 12)
    - Re: Libpcap on VMWare Guy Harris (Jan 12)
  - Re: Libpcap on VMWare Vikram Roopchand (Jan 12)
    - Re: Libpcap on VMWare Gert Doering (Jan 13)
    - Re: Libpcap on VMWare Vikram Roopchand (Jan 13)
    - Re: Libpcap on VMWare Vikram Roopchand (Jan 30)