bpf(4) DG/UX R4.11MU05 bpf(4)
NAME
bpf - Berkeley Packet Filter
SYNOPSIS
bpf0
DESCRIPTION
The Berkeley Packet Filter provides a raw interface to data link
layers in a protocol-independent fashion. All packets on the
network, even those destined for other hosts, are accessible through
this interface.
The packet filter appears as a character special clonable device,
/dev/bpf0. After opening this file, the file descriptor must be
bound to a specific interface with the BIOCSETIF or BIOCSETIF2
ioctls. The interfaces can be bound to more than one file
descriptor, and the filter underlying each descriptor will see an
identical packet stream. If /dev/bpf0 does not exist, you can build
a new kernel with the bpf() entry in the system file, and reboot your
system.
A user-settable packet filter is associated with each open instance
of bpf0. Whenever an interface receives a packet, all file
descriptors listening on that interface apply their filter. Each
descriptor that accepts the packet receives its own copy.
Reads from these file descriptors return the next group of packets
that have matched the filter. To improve performance, the buffer
passed to read must be the same size as the buffers used internally
by bpf. A user application can get/set the size of this buffer with
the BIOCGBLEN/BIOCSBLEN ioctl.
The packet filter supports the following link level protocols:
Ethernet, SLIP, FDDI, and Token Ring. The packet filter also
supports attaching at the bottom and top of IP; that is, ip_bottom
and ip_top, respectively.
Since packet data is in network byte order, applications should use
the byteorder(3N) macros to extract multi-byte values.
Ioctls
The ioctl command codes below are defined in <net/bpf.h>. All
commands require these includes:
#include <sys/types.h>
#include <sys/time.h>
#include <sys/ioctl.h>
#include <net/bpf.h>
Additionally, BIOCGETIF, BIOCSETIF, and BIOCSETIF2 require
<net/if.h>.
In addition to FIONREAD and SIOCGIFADDR, the following commands may
be applied to any open instance of bpf0. The third argument to the
ioctl should be a pointer to the type indicated.
BIOCGBLEN (u_int)
Returns the required buffer length for reads.
BIOCSBLEN (u_int)
Sets the buffer length (in bytes) for reads. If the
requested buffer size cannot be accommodated, the closest
allowable size will be set and returned in the argument. A
read call will result in EIO if it is passed a buffer that
is not this size. Note that an individual packet larger
than this size is necessarily truncated.
BIOCGDLT (u_int)
Returns the type of the data link layer underlying the
attached interface. EINVAL is returned if no interface has
been specified. The device types are defined in
<net/bpf.h>.
BIOCPROMISC
Forces the interface into promiscuous mode. All packets,
not just those destined for the local host, are processed.
Since more than one file can be listening on a given
interface, a listener that opened its interface non-
promiscuously may receive packets promiscuously. This
problem can be remedied with an appropriate filter.
The interface remains in promiscuous mode until all file
instances listening promiscuously are closed.
If the interface does not have a promiscuous mode, this
ioctl has no effect.
You must attach to an interface via the BIOCSETIF or
BIOCSETIF2 ioctl before issuing the BIOCPROMISC ioctl.
BIOCFLUSH Flushes the buffer of incoming packets and resets the
statistics that are returned by BIOCGSTATS.
BIOCGETIFLIST (struct bpf_if_list)
Returns a list of the interfaces which can be attached to
via the BIOCSETIF or BIOCSETIF2 ioctl. Upon entry, the
bifl_len field equals the size (in bytes) of the buffer
pointed to by the bifl_buf field. Upon return, the
bifl_len field equals the size (in bytes) of a buffer
required to fully accommodate the interface list; if the
interface list is larger than the buffer pointed to by the
bifl_buf field, only the number of elements which can fully
fit into the buffer are returned. If the bifl_version
field equals BPF_IF_VERSION1, each element of the interface
list is defined by struct bpf_if.
BIOCGETIF (struct ifreq)
Returns the name of the interface that was attached to by
the BIOCSETIF or BIOCSETIF2 ioctl. The name is returned in
the if_name field of ifreq. All other fields are
undefined.
BIOCSETIF (struct ifreq)
BIOCSETIF2 (dev_t)
Sets the interface associated with the file descriptor and
performs the actions of BIOCFLUSH. One of these ioctls
must be performed before any packets can be read. With
BIOSETIF, indicate the device name in the if_name field of
ifreq; the device name is a simple file name, not the
complete path (e.g. cien0). With BIOSETIF2, use the device
number to indicate the device; the device number of a
device is returned by the stat system call in the st_rdev
field of the struct stat structure.
BIOCGRTIMEOUT (struct timeval)
BIOCSRTIMEOUT (struct timeval)
Gets or sets the read timeout parameter. The value of
timeval specifies the maximum length of time the kernel
will wait before sending any buffered packets to a process
which is pended at a read of a bpf file descriptor. This
parameter is initialized to zero by open(2), indicating no
timeout.
BIOCGSTATS (struct bpf_stat)
Returns the following structure of packet statistics:
struct bpf_stat {
u_int bs_recv;
u_int bs_drop;
};
The fields are:
bs_recv the number of packets received by the
descriptor since opened or reset (including
any buffered since the last read call); this
includes packets which are rejected as well
as those which are accepted by the filter
program.
bs_drop the number of packets accepted by the filter
program but dropped by the kernel because of
buffer overflows (i.e., the application's
reads aren't keeping up with the packet
traffic).
BIOCIMMEDIATE (u_int)
Enables or disables "immediate mode," based on the truth
value of the argument. When immediate mode is enabled,
reads return immediately upon packet reception. This is
useful for programs that must respond to messages in real
time. Initially, an open instance of bpf0 has immediate
mode disabled, which means that reads block until either
the kernel buffer becomes full or a timeout occurs and data
must be read.
BIOCGMAXMEM (long)
BIOCSMAXMEM (long)
Gets or sets the maximum number of scratch memory locations
available for use by the filter program. Each location is
4 bytes.
The BIOCSETF ioctl explains how to set the filter program.
BIOCGHOSTTBL (bpf_host_table_t)
Returns a copy of the host table, which is maintained from
packets seen on the associated interface. The host table
can currently be maintained only if the interface is
Ethernet; the filter program must also contain an
instruction that causes the host table statistics to be
kept (see BPF_MISC+BPF_ROUTINES).
The hdr.num_table_eles field must be set to the number of
table elements in the buffer pointed to by the table_ptr
field. Upon return, the hdr.num_table_eles field is set to
the number of host table elements. (The number actually
returned is the smaller of the current number of elements
and the number of elements in the buffer.) Also upon
return, the hdr.max_table_eles field is set to the maximum
number of elements that can be in the host table. A user
application can get/set this value with the
BIOCGHOSTTBLSIZE/BIOCSHOSTTBLSIZE ioctls.
BIOCGMATRIXTBL (bpf_matrix_table_t)
Returns a copy of the matrix table, which is maintained
from packets seen on the associated interface. The matrix
table can currently be maintained only if the interface is
Ethernet; the filter program must also contain an
instruction that causes the matrix table statistics to be
kept (see BPF_MISC+BPF_ROUTINES).
The hdr.num_table_eles field must be set to the number of
table elements in the buffer pointed to by the table_ptr
field. Upon return, the hdr.num_table_eles field is set to
the number of matrix table elements. (The number actually
returned is the smaller of the current number of elements
and the number of elements in the buffer.) Also upon
return, the hdr.max_table_eles field is set to the maximum
number of elements that can be in the matrix table. A user
application can get/set this value with the
BIOCGMATRIXTBLSIZE/BIOCSMATRIXTBLSIZE ioctls.
BIOCGHOSTTBLSIZE (unsigned int)
BIOCSHOSTTBLSIZE (unsigned int)
Gets or sets the maximum number of elements that the kernel
will store in this host table before it begins to drop
elements.
BIOCGMATRIXTBLSIZE (unsigned int)
BIOCSMATRIXTBLSIZE (unsigned int)
Gets or sets the maximum number of elements that the kernel
will store in this matrix table before it begins to drop
elements.
BIOCSETF (struct bpf_program)
Sets the filter program and performs the actions of
BIOCFLUSH. An array of instructions and its length is
passed in using the following structure:
struct bpf_program {
int bf_len;
struct bpf_insn *bf_insns;
};
The fields are:
bf_insns points to the filter program
bf_len is the length of the filter program
struct bpf_insn
is the units of the length.
The FILTER MACHINE section explains the filter language.
BPF Header
The following structure is prepended to each packet returned by
read(2):
struct bpf_hdr {
struct timeval bh_tstamp;
u_long bh_caplen;
u_long bh_datalen;
u_short bh_hdrlen;
};
The fields, whose values are stored in host order, are:
bh_tstamp The time the packet was processed by the packet
filter.
bh_caplen The length of the captured portion of the packet.
This is the minimum of the truncation amount specified
by the filter and the length of the packet.
bh_datalen The length of the packet off the wire. This value is
independent of the truncation amount specified by the
filter.
bh_hdrlen The length of the BPF header, which may not be equal
to sizeof(struct bpf_hdr).
The bh_hdrlen field accounts for padding between the bpf_hdr
structure and the lowest level protocol header. This provides proper
alignment of the packet data structures, which is required on
alignment-sensitive architectures and improves performance on many
other architectures.
Additionally, individual packets are padded so that each starts on a
word boundary. This requires an application to know how to get from
packet to packet. The macro BPF_WORDALIGN, defined in <net/bpf.h>,
rounds up its argument to the nearest word-aligned value (where a
word is BPF_ALIGNMENT bytes wide).
For example, if p points to the start of a packet, this expression
advances it to the next packet:
p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen)
For the alignment mechanisms to work properly, the buffer passed to
read(2) must itself be word aligned. malloc(3) always returns an
aligned buffer.
Filter Machine
A filter program is an array of instructions, with all branches
forwardly directed, terminated by a return instruction. Each
instruction performs some action on the pseudo-machine state, which
consists of an accumulator, index register, scratch memory, and
implicit program counter.
The following structure defines the instruction format:
struct bpf_insn {
u_short code;
u_char jt;
u_char jf;
long k;
};
The k field is used in different ways by different instructions, and
the jt and jf fields are used as offsets by the branch instructions.
The opcodes are encoded in a semi-hierarchical fashion. There are
eight classes of instructions: BPF_LD, BPF_LDX, BPF_ST, BPF_STX,
BPF_ALU, BPF_JMP, BPF_RET, and BPF_MISC. Various other mode and
operator bits are or'd into the class to give the actual
instructions. The classes and modes are defined in <net/bpf.h>.
The semantics for each defined BPF instruction are given below. A is
the accumulator, X is the index register, P[] packet data, and M[]
scratch memory. P[i:n] gives the data at byte offset i in the
packet, interpreted as a word (n=4), unsigned halfword (n=2), or
unsigned byte (n=1). M[i] gives the i'th word in scratch memory,
which is addressed only in word units. Scratch memory is indexed
from 0 to the number of scratch memory locations (see BIOCGMAXMEM).
k, jt, and jf are the corresponding fields in the instruction
definition. len refers to the length of the packet.
BPF_LD These instructions copy a value into the accumulator. The
type of the source operand is specified by an "addressing
mode" and can be a constant (BPF_IMM), packet data at a
fixed offset (BPF_ABS), packet data at a variable offset
(BPF_IND), the packet length (BPF_LEN), or a word in
scratch memory (BPF_MEM). For BPF_IND and BPF_ABS, the
data size must be specified as a word (BPF_W), halfword
(BPF_H), or byte (BPF_B). The semantics of all the
recognized BPF_LD instructions follow.
BPF_LD+BPF_W+BPF_ABS A <- P[k:4]
BPF_LD+BPF_H+BPF_ABS A <- P[k:2]
BPF_LD+BPF_B+BPF_ABS A <- P[k:1]
BPF_LD+BPF_W+BPF_IND A <- P[X+k:4]
BPF_LD+BPF_H+BPF_IND A <- P[X+k:2]
BPF_LD+BPF_B+BPF_IND A <- P[X+k:1]
BPF_LD+BPF_W+BPF_LEN A <- len
BPF_LD+BPF_IMM A <- k
BPF_LD+BPF_MEM A <- M[k]
BPF_LDX These instructions load a value into the index register.
The addressing modes are more restricted than those of the
accumulator loads, but they include BPF_MSH, which
efficiently loads the IP header length.
BPF_LDX+BPF_W+BPF_IMM X <- k
BPF_LDX+BPF_W+BPF_MEM X <- M[k]
BPF_LDX+BPF_W+BPF_LEN X <- len
BPF_LDX+BPF_B+BPF_MSH X <- 4*(P[k:1]&0xf)
BPF_ST This instruction stores the accumulator into the scratch
memory. We do not need an addressing mode since there is
only one possibility for the destination.
BPF_ST M[k] <- A
BPF_STX This instruction stores the index register into the scratch
memory.
BPF_STX M[k] <- X
BPF_ALU The alu instructions perform operations between the
accumulator and index register or constant, and store the
result back in the accumulator. For binary operations, a
source mode is required (BPF_K or BPF_X).
BPF_ALU+BPF_ADD+BPF_K A <- A + k
BPF_ALU+BPF_SUB+BPF_K A <- A - k
BPF_ALU+BPF_MUL+BPF_K A <- A * k
BPF_ALU+BPF_DIV+BPF_K A <- A / k
BPF_ALU+BPF_AND+BPF_K A <- A & k
BPF_ALU+BPF_OR+BPF_K A <- A | k
BPF_ALU+BPF_LSH+BPF_K A <- A << k
BPF_ALU+BPF_RSH+BPF_K A <- A >> k
BPF_ALU+BPF_ADD+BPF_X A <- A + X
BPF_ALU+BPF_SUB+BPF_X A <- A - X
BPF_ALU+BPF_MUL+BPF_X A <- A * X
BPF_ALU+BPF_DIV+BPF_X A <- A / X
BPF_ALU+BPF_AND+BPF_X A <- A & X
BPF_ALU+BPF_OR+BPF_X A <- A | X
BPF_ALU+BPF_LSH+BPF_X A <- A << X
BPF_ALU+BPF_RSH+BPF_X A <- A >> X
BPF_ALU+BPF_NEG A <- -A
BPF_JMP The jump instructions alter flow of control. Conditional
jumps compare the accumulator against a constant (BPF_K) or
the index register (BPF_X). If the result is non-zero, the
true branch is taken, otherwise the false branch is taken.
Jump offsets are encoded in 8 bits, so the longest jump is
256 instructions. However, the jump always (BPF_JA) opcode
uses the 32-bit k field as the offset, allowing arbitrarily
distant destinations. All conditionals use unsigned
comparison conventions.
BPF_JMP+BPF_JA pc += k
BPF_JMP+BPF_JGT+BPF_K pc += (A > k) ? jt : jf
BPF_JMP+BPF_JGE+BPF_K pc += (A >= k) ? jt : jf
BPF_JMP+BPF_JEQ+BPF_K pc += (A == k) ? jt : jf
BPF_JMP+BPF_JSET+BPF_K pc += (A & k) ? jt : jf
BPF_JMP+BPF_JGT+BPF_X pc += (A > X) ? jt : jf
BPF_JMP+BPF_JGE+BPF_X pc += (A >= X) ? jt : jf
BPF_JMP+BPF_JEQ+BPF_X pc += (A == X) ? jt : jf
BPF_JMP+BPF_JSET+BPF_X pc += (A & X) ? jt : jf
BPF_RET The return instructions terminate the filter program and
specify the amount of packet to accept (i.e., they return
the truncation amount). A return value of zero indicates
that the packet should be ignored. The return value is
either a constant (BPF_K) or the accumulator (BPF_A).
BPF_RET+BPF_A accept A bytes
BPF_RET+BPF_K accept k bytes
BPF_MISC The miscellaneous category includes instructions that don't
fit into the above classes and new instructions that need
to be added. Currently, these are the register transfer
instructions, which copy the index register to the
accumulator and vice versa, and an instruction that
contains an index into a table of routines to perform
specific kernel processing on each packet.
BPF_MISC+BPF_TAX X <- A
BPF_MISC+BPF_TXA A <- X
BPF_MISC+BPF_ROUTINES Call the k'th routine. k can
have two values:
BPF_HOST_TABLE_ROUTINE, which
calls the routine to keep
host table statistics;
BPF_MATRIX_TABLE_ROUTINE,
which calls the routine to
keep matrix table statistics.
The BPF interface provides the following macros to facilitate array
initializers:
BPF_STMT(opcode, operand)
BPF_JUMP(opcode, operand, true_offset, false_offset)
EXAMPLES
This filter accepts only Reverse ARP requests.
struct bpf_insn insns[] = {
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
sizeof(struct ether_header)),
BPF_STMT(BPF_RET+BPF_K, 0),
};
This filter accepts only IP packets between host 128.3.112.15 and
128.3.112.35.
struct bpf_insn insns[] = {
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 26),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 30),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 30),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
BPF_STMT(BPF_RET+BPF_K, 0),
};
This filter returns only TCP finger packets. We must parse the IP
header to reach the TCP header. The BPF_JSET instruction checks that
the IP fragment offset is 0 so we are sure that we have a TCP header.
struct bpf_insn insns[] = {
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
BPF_STMT(BPF_RET+BPF_K, 0),
};
FILES
/dev/bpf0
SEE ALSO
tcpdump(1).
Licensed material--property of copyright holder(s)