7. Technical Information

Contents of this section

For those who want to play with the present drivers, or try to make up their own driver for a card that is presently unsupported, this information should be useful. If you do not fall into this category, then perhaps you will want to skip this section.

7.1 Probed Addresses

While trying to determine what ethernet card is there, the following addresses are autoprobed, assuming the type and specs of the card have not been set in the kernel. The file names below are in /usr/src/linux/drivers/net/


        3c501.c         0x280, 0x300
        3c503.c:        0x300, 0x310, 0x330, 0x350, 0x250, 0x280, 0x2a0, 0x2e0
        3c505.c:        0x300, 0x280, 0x310
        3c507.c:        0x300, 0x320, 0x340, 0x280
        3c509.c:        Special ID Port probe
        apricot.c       0x300
        at1700.c:       0x300, 0x280, 0x380, 0x320, 0x340, 0x260, 0x2a0, 0x240
        atp.c:          0x378, 0x278, 0x3bc
        depca.c         0x300, 0x200
        de600.c:        0x378
        de620.c:        0x378
        eexpress.c:     0x300, 0x270, 0x320, 0x340
        hp.c:           0x300, 0x320, 0x340, 0x280, 0x2C0, 0x200, 0x240
        hp-plus.c       0x200, 0x240, 0x280, 0x2C0, 0x300, 0x320, 0x340
        lance.c:        0x300, 0x320, 0x340, 0x360
        ne.c:           0x300, 0x280, 0x320, 0x340, 0x360
        ni52.c          0x300, 0x280, 0x360, 0x320, 0x340
        ni65.c          0x300, 0x320, 0x340, 0x360
        smc-ultra.c:    0x200, 0x220, 0x240, 0x280, 0x300, 0x340, 0x380
        wd.c:           0x300, 0x280, 0x380, 0x240

There are some NE2000 clone ethercards out there that are waiting black holes for autoprobe drivers. While many NE2000 clones are safe until they are enabled, some can't be reset to a safe mode. These dangerous ethercards will hang any I/O access to their `dataports'. The typical dangerous locations are:


        Ethercard jumpered base     Dangerous locations (base + 0x10 - 0x1f)
                0x300 *                         0x310-0x317
                0x320                           0x330-0x337
                0x340                           0x350-0x357
                0x360                           0x370-0x377

* The 0x300 location is the traditional place to put an ethercard, but it's also a popular place to put other devices (often SCSI controllers). The 0x320 location is often the next one chosen, but that's bad for for the AHA1542 driver probe. The 0x360 location is bad, because it conflicts with the parallel port at 0x378. If you have two IDE controllers, or two floppy controlers, then 0x360 is also a bad choice, as a NE2000 card will clobber them as well.

Note that kernels > 1.1.7X keep a log of who uses which i/o ports, and will not let a driver use i/o ports registered by an earlier driver. This may result in probes silently failing. You can view who is using what i/o ports by typing cat /proc/ioports if you have the proc filesystem enabled.

To avoid these lurking ethercards, here are the things you can do:

7.2 Skeleton / prototype driver

OK. So you have decided that you want to write a driver for the Foobar Ethernet card, as you have the programming information, and it hasn't been done yet. (...these are the two main requirements ;-) You can use the skeleton network driver that is provided with the Linux kernel source tree. It can be found in the file /usr/src/linux/drivers/net/skeleton.c as of 0.99pl15, and later.

It's also very useful to look at the Crynwr (nee Clarkson) driver for your target ethercard, if it's available. Russ Nelson nelson@crynwr.com has been actively updating and writing these, and he has been very helpful with his code reviews of the current Linux drivers.

7.3 Driver interface to the kernel

Here are some notes that may help when trying to figure out what the code in the driver segments is doing, or perhaps what it is supposed to be doing.


        int ethif_init(struct device *dev)
        {
            ...
                dev->send_packet = &ei_send_packet;
                dev->open = &ei_open;
                dev->stop = &ei_close;
                dev->hard_start_xmit = &ei_start_xmit;
                ...
        }

        int ethif_init(struct device *dev)

This function is put into the device structure in Space.c. It is called only at boot time, and returns `0' iff the ethercard `dev' exists.


        static int ei_open(struct device *dev)
        static int ei_close(struct device *dev)

This routine opens and initializes the board in response to a socket ioctl() usually called by `ifconfig'. It is commonly stuffed into the `struct device' by ethif_init().

The inverse routine is ei_close(), which should shut down the ethercard, free the IRQs and DMA channels if the hardware permits, and turn off anything that will save power (like the transceiver).


        static int ei_start_xmit(struct sk_buff *skb, struct device *dev)
                dev->hard_start_xmit = &ei_start_xmit;

This routine puts packets to be transmitted into the hardware. It is usually stuffed into the `struct device' by ethif_init().

When the hardware can't accept additional packets it should set the dev->tbusy flag. When additional room is available, usually during a transmit-complete interrupt, dev->tbusy should be cleared and the higher levels informed with mark_bh(INET_BH).


            if (dev_rint(buffer, length, is_skb ? IN_SKBUFF : 0, dev))
                   stats->rx_dropped++;

A received packet is passed to the higher levels using dev_rint(). If the unadorned packet data is in a memory buffer, dev_rint will copy it into a `skbuff' for you. Otherwise a new skbuff should be kmalloc()ed, filled, and passed to dev_rint() with the IN_SKBUFF flag.


        int s=socket(AF_INET,SOCK_PACKET,htons(ETH_P_ALL));

Gives you a socket recieving every protocol type. Do recvfrom() calls to it and it will fill the sockaddr with device type in sa_family and the device name in the sa_data array. I don't know who originally invented SOCK_PACKET for Linux (its been in for ages) but its superb stuff. You can use it to send stuff raw too (both only as root).

7.4 Interrupts and Linux

There are two kinds of interrupt handlers in Linux: fast ones and slow ones. You decide what kind you are installing by the flags you pass to irqaction(). The fast ones, such as the serial interrupt handler, run with _all_ interrupts disabled. The normal interrupt handlers, such as the one for ethercard drivers, runs with other interrupts enabled.

There is a two-level interrupt structure. The `fast' part handles the device register, removes the packets, and perhaps sets a flag. After it is done, and interrupts are re-enabled, the slow part is run if the flag is set.

The flag between the two parts is set by:

mark_bh(INET_BH);

Usually this flag is set within dev_rint() during a received-packet interrupt, and set directly by the device driver during a transmit-complete interrupt.

You might wonder why all interrupt handlers cannot run in `normal mode' with other interrupts enabled. Ross Biro uses this scenario to illustrate the problem:

The `fast' interrupt structure solves this problem by allowing bounded-time interrupt handlers to run without the risk of leaving their interrupt lines masked by another interrupt request.

There is an additional distinction between fast and slow interrupt handlers -- the arguments passed to the handler. A `slow' handler is defined as



                static void
                handle_interrupt(int reg_ptr)
                {
                    int irq = -(((struct pt_regs *)reg_ptr)->orig_eax+2);
                    struct device *dev = irq2dev_map[irq];
                ...

While a fast handler gets the interrupt number directly



                static void
                handle_fast_interrupt(int irq)
                {
                ...

A final aspect of network performance is latency. The only board that really addresses this is the 3c509, which allows a predictive interrupt to be posted. It provides an interrupt response timer so that the driver can fine-tune how early an interrupt is generated.

Alan Cox has some advice for anyone wanting to write drivers that are to be used with 0.99pl14 kernels and newer. He says:

`Any driver intended for 0.99pl14 should use the new alloc_skb() and kfree_skbmem() functions rather than using kmalloc() to obtain a sk_buff. The new 0.99pl14 skeleton does this correctly. For drivers wishing to remain compatible with both sets the define `HAVE_ALLOC_SKB' indicates these functions must be used.

In essence replace

skb=(struct sk_buff *)kmalloc(size)

with

skb=alloc_skb(size)

and

kfree_s(skb,size)

with

kfree_skbmem(skb,size) /* Only sk_buff memory though */

Any questions should I guess be directed to me (Alan Cox) since I made the change. This is a change to allow tracking of sk_buff's and sanity checks on buffers and stack behaviour. If a driver produces the message'File: ??? Line: ??? passed a non skb!' then it is probable the driver is not using the new sk_buff allocators.'

7.5 Programmed I/O vs. Shared Memory vs. DMA

Ethernet is 10Mbs. (Don't be pedantic, 3Mbs and 100Mbs don't count.) If you can already send and receive back-to-back packets, you just can't put more bits over the wire. Every modern ethercard can receive back-to-back packets. The Linux DP8390 drivers come pretty close to sending back-to-back packets (depending on the current interrupt latency) and the 3c509 and AT1500 hardware has no problem at all automatically sending back-to-back packets.

The ISA bus can do 5.3MB/sec (42Mb/sec), which sounds like more than enough. You can use that bandwidth in several ways:

Programmed I/O

Pro: Doesn't use any constrained system resources, just a few I/O registers, and has no 16M limit.

Con: Usually the slowest transfer rate, the CPU is waiting the whole time, and interleaved packet access is usually difficult to impossible.

Shared memory

Pro: Simple, faster than programmed I/O, and allows random access to packets.

Con: Uses up memory space (a big one for DOS users, only a minor issue under Linux), and it still ties up the CPU.

Slave (normal) Direct Memory Access

Pro: Frees up the CPU during the actual data transfer.

Con: Checking boundary conditions, allocating contiguous buffers, and programming the DMA registers makes it the slowest of all techniques. It also uses up a scarce DMA channel, and requires aligned low memory buffers.

Master Direct Memory Access (bus-master)

Pro: Frees up the CPU during the data transfer, can string together buffers, can require little or no CPU time lost on the ISA bus.

Con: Requires low-memory buffers and a DMA channel. Any bus-master will have problems with other bus-masters that are bus-hogs, such as some primitive SCSI adaptors. A few badly-designed motherboard chipsets have problems with bus-masters. And a reason for not using any type of DMA device is using a Cyrix 486 processor designed for plug-in replacement of a 386: these processors must flush their cache with each DMA cycle. (This includes the Cx486DLC, Ti486DLC, Cx486SLC, Ti486SLC, etc.)

7.6 Programming the Intel chips (i82586 and i82593)

These chips are used on a number of cards, namely the 3c507 ('86), the Intel EtherExpress 16 ('86), Microdyne's exos205t ('86), the Z-Note ('93), and the Racal-Interlan ni5210 ('86).

Russ Nelson writes: `Most boards based on the 82586 can reuse quite a bit of their code. More, in fact, than the 8390-based adapters. There are only three differences between them:

The Intel EtherExpress 16 is an exception, as it I/O maps the 82586. Yes, I/O maps it. Fairly clunky, but it works.

Garrett Wollman did an AT&T driver for BSD that uses the BSD copyright. The latest version I have (Sep '92) only uses a single transmit buffer. You can and should do better than this if you've got the memory. The AT&T and 3c507 adapters do; the ni5210 doesn't.

The people at Intel gave me a very big clue on how you queue up multiple transmit packets. You set up a list of NOP-> XMIT-> NOP-> XMIT-> NOP-> XMIT-> beginning) blocks, then you set the `next' pointer of all the NOP blocks to themselves. Now you start the command unit on this chain. It continually processes the first NOP block. To transmit a packet, you stuff it into the next transmit block, then point the NOP to it. To transmit the next packet, you stuff the next transmit block and point the previous NOP to it. In this way, you don't have to wait for the previous transmit to finish, you can queue up multiple packets without any ambiguity as to whether it got accepted, and you can avoid the command unit start-up delay.'

7.7 Technical information from 3Com

If you are interested in working on drivers for 3Com cards, you can get technical documentation from 3Com. Cameron has been kind enough to tell us how to go about it below:

3Com's Ethernet Adapters are documented for driver writers in our `Technical References' (TRs). These manuals describe the programmer interfaces to the boards but they don't talk about the diagnostics, installation programs, etc that end users can see.

The Network Adapter Division marketing department has the TRs to give away. To keep this program efficient, we centralized it in a thing called `CardFacts.' CardFacts is an automated phone system. You call it with a touch-tone phone and it faxes you stuff. To get a TR, call CardFacts at 408-727-7021. Ask it for Developer's Order Form, document number 9070. Have your fax number ready when you call. Fill out the order form and fax it to 408-764-5004. Manuals are shipped by Federal Express 2nd Day Service.

If you don't have a fax and nobody you know has a fax, really and truly, then send mail to Terry_Murphy@3Mail.3Com.com and tell her about your problem. PLEASE use the fax thing if you possibly can.

After you get a manual, if you still can't figure out how to program the board, try our `CardBoard' BBS at 1-800-876-3266, and if you can't do that, write Andy_Chan@3Mail.3com.com and ask him for alternatives. If you have a real stumper that nobody has figured out yet, the fellow who needs to know about it is Steve_Lebus@3Mail.3com.com.

There are people here who think we are too free with the manuals, and they are looking for evidence that the system is too expensive, or takes too much time and effort. That's why it's important to try to use CardFacts before you start calling and mailing the people I named here.

There are even people who think we should be like Diamond and Xircom, requiring tight `partnership' with driver writers to prevent poorly performing drivers from getting written. So far, 3Com customers have been really good about this, and there's no problem with the level of requests we've been getting. We need your continued cooperation and restraint to keep it that way.

Cameron Spitzer, 408-764-6339 3Com NAD Santa Clara work: camerons@nad.3com.com home: cls@truffula.sj.ca.us

7.8 Notes on AMD PCnet / LANCE Based cards

The AMD LANCE (Local Area Network Controller for Ethernet) was the original offering, and has since been replaced by the `PCnet-ISA' chip, otherwise known as the 79C960. A relatively new chip from AMD, the 79C960, is the heart of many new cards being released at present. Note that the name `LANCE' has stuck, and some people will refer to the new chip by the old name. Dave Roberts of the Network Products Division of AMD was kind enough to contribute the following information regarding this chip:

`As for the architecture itself, AMD developed it originally and reduced it to a single chip -- the PCnet(tm)-ISA -- over a year ago. It's been selling like hotcakes ever since.

Functionally, it is equivalent to a NE1500. The register set is identical to the old LANCE with the 1500/2100 architecture additions. Older 1500/2100 drivers will work on the PCnet-ISA. The NE1500 and NE2100 architecture is basically the same. Initially Novell called it the 2100, but then tried to distinguish between coax and 10BASE-T cards. Anything that was 10BASE-T only was to be numbered in the 1500 range. That's the only difference.

Many companies offer PCnet-ISA based products, including HP, Racal-Datacom, Allied Telesis, Boca Research, Kingston Technology, etc. The cards are basically the same except that some manufacturers have added `jumperless' features that allow the card to be configured in software. Most have not. AMD offers a standard design package for a card that uses the PCnet-ISA and many manufacturers use our design without change. What this means is that anybody who wants to write drivers for most PCnet-ISA based cards can just get the data-sheet from AMD. Call our literature distribution center at (800)222-9323 and ask for the Am79C960, PCnet-ISA data sheet. It's free.

A quick way to understand whether the card is a `stock' card is to just look at it. If it's stock, it should just have one large chip on it, a crystal, a small IEEE address PROM, possibly a socket for a boot ROM, and a connector (1, 2, or 3, depending on the media options offered). Note that if it's a coax card, it will have some transceiver stuff built onto it as well, but that should be near the connector and away from the PCnet-ISA.'

There is also some info regarding the LANCE chip in the file lance.c which is included in the standard kernel.

A note to would-be card hackers is that different LANCE implementations do `restart' in different ways. Some pick up where they left off in the ring, and others start right from the beginning of the ring, as if just initialised. This is a concern when setting the multicast list.

7.9 Multicast and Promiscuous Mode

Another one of the things Donald has worked on is implementing multicast and promiscuous mode hooks. All of the released (i.e. not ALPHA) ISA drivers now support promiscuous mode. There was a minor problem with 8390 based cards with capturing multicast packets, in that the promiscuous mode setting in 8390.c around line 574 should be 0x18 and not 0x10. If you have an up to date kernel, this will already be fixed.

Donald writes: `At first I was planning to do it while implementing either the /dev/* or DDI interface, but that's not really the correct way to do it. We should only enable multicast or promiscuous modes when something wants to look at the packets, and shut it down when that application is finished, neither of which is strongly related to when the hardware is opened or released.

I'll start by discussing promiscuous mode, which is conceptually easy to implement. For most hardware you only have to set a register bit, and from then on you get every packet on the wire. Well, it's almost that easy; for some hardware you have to shut the board (potentially dropping a few packet), reconfigure it, and then re-enable the ethercard. This is grungy and risky, but the alternative seems to be to have every application register before you open the ethercard at boot-time.

OK, so that's easy, so I'll move on something that's not quite so obvious: Multicast. It can be done two ways:

  1. Use promiscuous mode, and a packet filter like the Berkeley packet filter (BPF). The BPF is a pattern matching stack language, where you write a program that picks out the addresses you are interested in. Its advantage is that it's very general and programmable. Its disadvantage is that there is no general way for the kernel to avoid turning on promiscuous mode and running every packet on the wire through every registered packet filter. See The Berkeley Packet Filter for more info.
  2. Using the built-in multicast filter that most etherchips have.

I guess I should list what a few ethercards/chips provide:

        
        Chip/card  Promiscuous  Multicast filter
        ----------------------------------------
        Seeq8001/3c501  Yes     Binary filter (1)
        3Com/3c509      Yes     Binary filter (1)
        8390            Yes     Autodin II six bit hash (2) (3)
        LANCE           Yes     Autodin II six bit hash (2) (3)
        i82586          Yes     Hidden Autodin II six bit hash (2) (4)
        

  1. These cards claim to have a filter, but it's a simple yes/no `accept all multicast packets', or `accept no multicast packets'.
  2. AUTODIN II is the standard ethernet CRC (checksum) polynomial. In this scheme multicast addresses are hashed and looked up in a hash table. If the corresponding bit is enabled, this packet is accepted. Ethernet packets are laid out so that the hardware to do this is trivial -- you just latch six (usually) bits from the CRC circuit (needed anyway for error checking) after the first six octets (the destination address), and use them as an index into the hash table (six bits -- a 64-bit table).
  3. These chips use the six bit hash, and must have the table computed and loaded by the host. This means the kernel must include the CRC code.
  4. The 82586 uses the six bit hash internally, but it computes the hash table itself from a list of multicast addresses to accept.

Note that none of these chips do perfect filtering, and we still need a middle-level module to do the final filtering. Also note that in every case we must keep a complete list of accepted multicast addresses to recompute the hash table when it changes.

My first pass at device-level support is detailed in the new outline driver skeleton.c

It looks like the following:


        #ifdef HAVE_MULTICAST
        static void set_multicast_list(struct device *dev, int num_addrs,
                         void *addrs);
        #endif
        .
        .
        
        ethercard_open() {
        ...
        #ifdef HAVE_MULTICAST
                dev->set_multicast_list = &set_multicast_list;
        #endif
        ...
        
        #ifdef HAVE_MULTICAST
        /* Set or clear the multicast filter for this adaptor.
           num_addrs -- -1      Promiscuous mode, receive all packets
           num_addrs -- 0       Normal mode, clear multicast list
           num_addrs > 0        Multicast mode, receive normal and
                MC packets, and do best-effort filtering.
         */
        static void
        set_multicast_list(struct device *dev, int num_addrs, void *addrs)
        {
        ...

Any comments, criticism, etc. are welcome.'

7.10 The Berkeley Packet Filter (BPF)

The general idea of the developers is that the BPF functionality should not be provided by the kernel, but should be in a (hopefully little-used) compatibility library.

For those not in the know: BPF (the Berkeley Packet Filter) is an mechanism for specifying to the kernel networking layers what packets you are interested in. It's implemented as a specialized stack language interpreter built into a low level of the networking code. An application passes a program written in this language to the kernel, and the kernel runs the program on each incoming packet. If the kernel has multiple BPF applications, each program is run on each packet.

The problem is that it's difficult to deduce what kind of packets the application is really interested in from the packet filter program, so the general solution is to always run the filter. Imagine a program that registers a BPF program to pick up a low data-rate stream sent to a multicast address. Most ethernet cards have a hardware multicast address filter implemented as a 64 entry hash table that ignores most unwanted multicast packets, so the capability exists to make this a very inexpensive operation. But with the BFP the kernel must switch the interface to promiscuous mode, receive _all_ packets, and run them through this filter. This is work, BTW, that's very difficult to account back to the process requesting the packets.


Next Chapter, Previous Chapter

Table of contents of this chapter, General table of contents

Top of the document, Beginning of this Chapter