tcpdump mailing list archives

Re: DLT value for IP over IB (Infiniband)


From: Darren Reed <darren.reed () oracle com>
Date: Tue, 02 Aug 2011 09:15:26 -0700

Unfortunately, I don't yet have a version of editcap that works with IB files thus:
=========================
Your message to tcpdump-workers has been delayed, and requires the approval
of the moderators, for the following reason(s):

The message body is too long (43313>  40000)

If you do not wish the message to be posted, or have other concerns,
please send a message to the list owners at the following address:
  tcpdump-workers-owner () lists tcpdump org
=========================


Re: [tcpdump-workers] DLT value for IP over IB (Infiniband).eml

Subject:
Re: [tcpdump-workers] DLT value for IP over IB (Infiniband)
From:
Darren Reed <darren.reed () oracle com>
Date:
Tue, 02 Aug 2011 08:57:40 -0700

To:
Guy Harris <guy () alum mit edu>
CC:
tcpdump-workers () lists tcpdump org


On 07/29/11 09:49, Guy Harris wrote:

On Jul 27, 2011, at 3:02 AM, Darren Reed wrote:

With Solaris, the interfaces available from the driver and protocol stack prohibit access to actual packets at the link layer. I don't know if this is or will be possible with Linux, but if the link layer header for IPoIB on Linux is 12 bytes, then no, the data before the IP header that is exposed by Infiniband on Linux is not the link layer header. Furthermore, the comments that I've received suggest that this type of access to network packets is not possible with Infiniband.

For ARP packets, the influence of Infiniband is simply on the size of the address placed in the ARP packets.

The address used in ARP packets for Infiniband is the same across all implementations of IPoIB.

So whilst the pre-IP header is different on Solaris and Linux for Infiniband packets, the Infiniband address placed in the ARP packets is an Infiniband address and is not dependent on the implementation of IPoIB.

Thus mapping ARPHRD_INFINIBAND to 32 will be fine for both Linux and Solaris.

Presumably "for Solaris" means that, for libpcap on Solaris 11, you have a choice of using BPF (which returns DLT_ values), PF_PACKET sockets (which returns ARPHRD_ values), and DLPI (which returns DL_ values)? If it doesn't support using PF_PACKET sockets for capturing, libpcap-on-Solaris has no reason to care about ARPHRD_anything.

The ARPHRD_INFINIBAND value (32) is seen by tcpdump when decoding ARP headers in IPOIB
traffic on the relevant interfaces. As can be seen in this patch:

diff -uN tcpdump-4.1.1/print-arp.c tcpdump-4.1.1.new/print-arp.c
--- tcpdump-4.1.1/print-arp.c   2010-03-11 17:56:44.000000000 -0800
+++ tcpdump-4.1.1.new/print-arp.c       2011-07-14 09:01:08.965396346 -0700
@@ -62,6 +62,7 @@
         u_char  ar_hln;         /* length of hardware address */
         u_char  ar_pln;         /* length of protocol address */
         u_short ar_op;          /* one of: */
+#define ARPHRD_INFINIBAND 32    /* Infiniband RFC 4391 */
 #define ARPOP_REQUEST   1       /* request to resolve address */
 #define ARPOP_REPLY     2       /* response to previous request */
#define ARPOP_REVREQUEST 3 /* request protocol address given hardware */
@@ -118,6 +119,7 @@
     { ARPHRD_STRIP, "Strip" },
     { ARPHRD_IEEE1394, "IEEE 1394" },
     { ARPHRD_ATM2225, "ATM" },
+    { ARPHRD_INFINIBAND, "Infiniband" },
     { 0, NULL }
 };


Here the symbol "ARPHRD_INFINBAND" is defined only for use with printing
out ARP packets. Now that I think about it, the above patch isn't
really the best but it should give you an idea about what the problem
is here. Without the above patch, tcpdump prints that it has an address
for an unknown address type in the ARP messages. That message can be
confusing and is avoidable.

Yes, on Solaris, DL_IB is defined for use with DLPI and Infiniband.


For the DLT values, I'm going to use the names DLT_IPOIB and LINKTYPE_SOLARIS_IPOIB for Solaris 11. If a pair of numbers can be assigned in the next 24 or so hours, I'll use those, otherwise it'll be DLT_USER15 for both. If I understand correctly, the design is such that libpcap on Linux would then map DLT_IPOIB to LINKTYPE_LINUX_IPOIB

No. As there are APIs in libpcap that are expected to return DLT_ values for savefiles, and as savefiles have LINKTYPE_ values in them (because there are some cases where different BSDs use different numerical values for the same DLT_ definitions - and, in at least some of those cases, BSD #1 uses a given numerical value for DLT_xxx and BSD #2 uses that numerical value for DLT_yyy and a different numerical value for DLT_xxx - so we need a single LINKTYPE_xxx numerical value to correspond to all of the different numerical values of DLT_xxx), so there would have to be different DLT_s for LINKTYPE_SOLARIS_IPOIB and LINKTYPE_LINUX_IPOIB.

Right, I'm with you on that.

So Linux would, presumably, when opening an Infiniband interface, map ARPHRD_INFINIBAND (32) to DLT_LINUX_IPOIB, just as Solaris BPF would just return get DLT_SOLARIS_IPOIB and, if there's DLPI access to those interfaces, libpcap on Solaris's DLPI code would map DL_IPOIB or whatever to DLT_SOLARIS_IPOIB (if they have the same link-type header format). libpcap on *all* platforms, and WinPcap on Windows, would map LINKTYPE_SOLARIS_IPOIB in a capture file to DLT_SOLARIS_IPOIB and would map LINKTYPE_LINUX_IPOIB in a capture file to DLT_LINUX_IPOIB to be returned by pcap_datalink().

DLPI's DL_IB and BPF's DLT_SOLARIS_IPOIB on Solaris result in the same header that is found before the IP header being received by applications.

With this email, I've attached a capture from an IB adapter on Solaris. The patch above is required to make "tcpdump -v" sensible with ARP messages inside IPOIB, example:

08:11:08.610137 ARP, Infiniband (len 20), IPv4 (len 4), Request who-has 192.168.37.12 (00:ff:ff:ff:ff:10:40:1b:00:00:00:00:00:00:00:00:ff:ff:ff:ff) tell 192.168.37.1, length 56 08:11:08.610327 ARP, Infiniband (len 20), IPv4 (len 4), Reply 192.168.37.12 is-at 80:00:00:51:fe:80:00:00:00:00:00:00:00:21:28:00:01:a1:1d:45, length 56

Darren

-
This is the tcpdump-workers list.
Visit https://cod.sandelman.ca/ to unsubscribe.


Current thread: