ND(4P) — SPECIAL FILES
NAME
nd − network disk driver
SYNOPSIS
pseudo-device nd
DESCRIPTION
The network disk device, /dev/nd∗, allows a client workstation to perform disk I/O operations on a server system over the network. To the client system, this device looks like any normal disk driver: it allows read/write operations at a given block number and byte count. Note that this provides a network disk block access service rather than a network file access service.
Typically the client system will have no disks at all. In this case /dev/nd0 contains the client’s root file system (including /usr files), and nd1 is used as a paging area. Client access to these devices is converted to net disk protocol requests and sent to the server system over the network. The server receives the request, performs the actual disk I/O, and sends a response back to the client.
The server contains a table which lists the net address of each of his clients and the server disk partition which corresponds to each client unit number (nd0,1,...). This table resides in the server kernel in a structure owned by the nd device. The table is initialized by running the program /etc/nd with text file /etc/nd.local as its input. /etc/nd then issues ioctl(2) functions to load the table into the kernel.
In addition to the read/write units /dev/nd∗, there are public read-only units which are named /dev/ndp∗. The correspondence to server partitions is specified by the /etc/nd.local text file, in a similar manner to the private partitions. The public units can be used to provide shared access to binaries or libraries (/bin, /usr/bin, /usr/ucb, /usr/lib) so that each diskless client does not have to consume space in his private partitions for these files. This is done by providing a public file system at the server (/dev/ndp0) which is mounted on /pub of each diskless client. The clients then use symbolic links to read the public files: /bin -> /pub/bin, /usr/ucb -> /pub/usr/ucb. One requirement in this case is that the server (who has read/write access to this file system) should not perform write activity with any public filesystem. This is because each client is locally cacheing blocks, and may get out of sync with the physical disk image. In certain cases, the client will detect an inconsistency and panic.
One last type of unit is provided for use by the server. These are called local units and are named /dev/ndl∗. The Sun physical disk sector 0 label only provides a limited number of partitions per physical disk (eight). Since this number is small and these partitions have somewhat fixed meanings, the nd driver itself has a subpartitioning capability built-in. This allows the large server physical disk partition (e.g. /dev/xy0g ) to be broken up into any number of diskless client partitions. Of course on the client side these would be referenced as /dev/nd0,1,... ; but the server needs to reference these client partitions from time to time, to do mkfs(8) and fsck(8) for example. The /dev/ndl∗ entries allow the server ‘local’ access to his subpartitions without causing any net activity. The actual local unit number to client unit number correspondence is again recorded in the /etc/nd.local text file.
The nd device driver is the same on both the client and server sides. There are no user level processes associated with either side, thus the latency and transfer rates are close to maximal.
The minor device and ioctl encoding used is given in file <sun/ndio.h>. The low six bits of the minor number are the unit number. The 0x40 bit indicates a public unit; the 0x80 bit indicates a local unit.
INITIALIZATION
No special initialization is required on the client side; he finds the server by broadcasting the initial request. Upon getting a response, he locks onto that server address.
At the server, the nd(8C) command initializes the network disk service by issuing ioctl’s to the kernel.
ERRORS
Generally physical disk I/O errors detected at the server are returned to the client for action. If the server is down or unaccessable, the client will see the console message:
nd: file server not responding: still trying.
The client continues (forever) making his request until he gets positive acknowledgement from the server. This means the server can crash or power down and come back up without any special action required of the user at the client machine. It also means the process performing the I/O to nd will block, insensitive to signals, since the process is sleeping inside the kernel at PRIBIO.
PROTOCOL AND DRIVER INTERNALS
The protocol packet is defined in <sun/ndio.h> and also included below:
/∗
∗ ‘nd’ protocol packet format.
∗/
struct ndpack {
structip np_ip;/∗ ip header, proto IPPROTO_ND ∗/
u_charnp_op; /∗ operation code, see below ∗/
u_charnp_min; /∗ minor device ∗/
charnp_error;/∗ b_error ∗/
charnp_ver; /∗ version number ∗/
longnp_seq; /∗ sequence number ∗/
longnp_blkno;/∗ b_blkno, disk block number ∗/
longnp_bcount;/∗ b_bcount, byte count ∗/
longnp_resid;/∗ b_resid, residual byte count ∗/
longnp_caddr;/∗ current byte offset of this packet ∗/
longnp_ccount;/∗ current byte count of this packet ∗/
};/∗ data follows ∗/
/∗
∗ np_oe operation codes.
∗/
#define NDOPREAD1/∗ read ∗/
#define NDOPWRITE2/∗ write ∗/
#define NDOPERROR3/∗ error ∗/
#define NDOPCODE7/∗ op code mask ∗/
#define NDOPWAIT010/∗ waiting for DONE or next request ∗/
#define NDOPDONE020/∗ operation done ∗/
/∗
∗ misc protocol defines.
∗/
#define NDMAXDATA1024/∗ max data per packet ∗/
#define NDMAXIO 63∗1024/∗ max np_bcount ∗/
IP datagrams were chosen instead of UDP datagrams because only the IP header is checksummed, not the entire packet as in UDP. Also the kernel level interface to the IP layer is simpler. The min, blkno, and bcount fields are copied directly from the client’s strategy request. The sequence number field seq is incremented on each new client request and is matched with incoming server responses. The server essentially echos the request header in his responses, altering certain fields. The caddr and ccount fields show the current byte address and count of the data in this packet, or the data expected to be sent by the other side.
The protocol is very simple and driven entirely from the client side. As soon as the client ndstrategy routine is called, the request is sent to the server; this allows disk sorting to occur at the server as soon as possible. Transactions which send data (client writes on the client side, client reads on the server side) can only send a set number of packets of NDMAXDATA bytes each, before waiting for an acknowledgement. The defaults are currently set at 6 packets of 1K bytes each; the NDIOCETHER ioctl allows setting this value on the server side. This allows the normal 4K byte case to occur with just one ‘transaction’. The NDOPWAIT bit is set in the op field by the sender to indicate he will send no more until acknowledged (or requested) by the other side. The NDOPDONE bit is set by the server side to indicate the request operation has completed; for both the read and write cases this means the requested disk I/O has actually occured.
Requests received by the server are entered on an active list which is timed out and discarded if not completed within NDXTIMER seconds. Requests received by the server allocate a bcount size buffer to minimize buffer copying. Contiguous DMA disk I/O thus occurs in the same size chunks it would if requested from a local physical disk.
BOOTSTRAP
The Sun workstation has PROM code to perform a net boot using this driver. Usually, the boot files are obtained from public device 0 (/dev/ndp0) on the server with which the client is registered; this allows multiple servers to exist on the same net (even running different releases of kernel and boot software). If the station you are booting is not registered on any of the servers, you will have to specify the hex Internet host number of the server in a boot command string like: ‘bec(0,5,0)vmunix’.
This booting performs exactly the same steps involved in a real disk boot:
1)User types ‘b’ to PROM monitor.
2)PROM loads blocks 1 thru 15 of /dev/ndp0 (bootnd).
3)bootnd loads /boot.
4)/boot loads /vmunix.
SEE ALSO
BUGS
The operations described in dkio(4) are not supported.
The local host’s disk buffer cache is not used by network disk access. This means that if either a local host or a remote host is writing, the changes will be visible at random based on the cache hit frequency on the local host. Use sync on the server to force the data out to disk. If both the local and remote hosts are writing to the same filesystem, one machine’s changes can be randomly lost, based again on cache hit and deferred write timings.
If an R/O remote file system is mounted R/W by mistake, it is impossible to umount it.
Sun Release 3.5 — Last change: 26 July 1985