A device driver (part of the kernel) controls devices by writing commands to the controller's command register.
The device controller uses the status register to communicate the state of the device to the operating system, for example whether the device is busy, idle, done, or in an error state.
The controller has one or more data registers used for passing data to or from the device.
Polled device controllers use the status register to indicate when the device needs attention. The operating system (or hardware) must periodically check this register.
Interrupt-driven device controllers raise interrupts when the device needs attention. The operating system's (device driver's) interrupt handler then reads the device status register and takes appropriate action.
Device controllers that use direct memory access can use the system's memory bus to transfer data directly between the system's memory and the device controller's data registers. The device controller sets its status register and raises an interrupt when the transfer is complete.
Many architectures had explicit i/o operations as part of the CPU instruction set:
input device_address ; write ``read'' command to command register. ouput device_address ; write ``write'' command to command register. test register, device_address ; copy status register. copy_in register, device_address, device_register ; read data register. copy_out register, device_address, device_register ; write to data register.
Modern architectures arrange for the device registers to occupy addresses in the machines memory space, so special i/o instructions can be replace with memory operations:
mv device_reg, cpu_reg ; write command or data to device. mv cpu_reg, device_reg ; read status or data from device.
Suppose we had a ``silicon compiler'' that would translate C code into a photomask for creating a chip. Following is the C code for the fetch-execute cycle of a CPU, with simple interrupt handling (from Nutt, Operating Systems, a Modern Perspective, p.85):
while (!halt) { IR = memory[PC]; /* Load instruction register w/ inst. at PC */ PC++; /* Increment program counter. */ execute(IR); if (interrupt_req) { /* Devices set interrupt_req to raise memory[0] = PC; an interrupt. */ PC = memory[1]; /* Jump to interrupt handler. */ } }
We could enhance this by allowing different handlers for different interrupts, by reserving a range of memory addresses to hold entry points to interrupt handlers; each device then sets the interrupt request flag to an integer identifying the device:
while (!halt) { IR = memory[PC]; PC++; execute(IR); if (interrupt_req) { memory[0] = PC; PC = int_vector[interrupt_req]; } }
In both cases, the interrupt handler must save any data necessary to resume execution of the interrupted process.
The interrupt handling mechanism can be further enhanced to do some of this:
void save_state() { memory[0] = PC; /* Program Counter. */ memory[1] = SP; /* Stack Pointer. */ memory[2] = BR; /* Memory image base register. */ memory[3] = ER; /* Memory image extent register. */ } while (!halt) { IR = memory[PC]; PC++; execute(IR); if (interrupt_req) { save_state(); PC = int_vector[interrupt_req]; } }
Unix divides devices (and device drivers) into two broad categories: character and block. Character devices generally do i/o as a stream of characters. Block devices require i/o a block at a time.
Each device has a major number, indicating the type of the device, and a minor number, indicating the instance. The kernel maintains a table of device structures that points to drivers.
Unix device drivers are divided into two parts:
Implements interface and maintains state of pending operations.
Handles interrupts.
In the Unix world, IO is based on a file model. So the device drivers provide an abstraction that is close to this model.
Prepare the device.
Clean up; the caller is done.
Request input from device (character devices).
Request input to device (character devices).
Checks to see if read/write will succeed (character devices).
Schedule a read or write (block devices).
Driver or device specific commands or data (character devices).
Stop stream output (character devices).
Dump memory image (block devices).
The Minix device driver interface is exported as a set of messages a driver can handle:
Ready (``open'') the device for reading/writing.
Close device (clean up and free for another process to use).
Read data from device.
Write data to device.
Read/write a list of blocks from block device.
Set device-specific attributes.
Every Minix device driver has the following structure:
/* kernel/driver.c */ /*===========================================================================* * driver_task * *===========================================================================*/ PUBLIC void driver_task(dp) struct driver *dp; /* Device dependent entry points. */ { /* Main program of any device driver task. */ int r, caller, proc_nr; message mess; init_buffer(); /* Get a DMA buffer. */ /* Here is the main loop of the disk task. It waits for a message, carries * it out, and sends a reply. */ while (TRUE) { /* First wait for a request to read or write a disk block. */ receive(ANY, &mess); caller = mess.m_source; proc_nr = mess.PROC_NR; switch (caller) { case HARDWARE: /* Leftover interrupt. */ continue; case FS_PROC_NR: /* The only legitimate caller. */ break; default: printf("%s: got message from %d\n", (*dp->dr_name)(), caller); continue; } /* Now carry out the work. */ switch(mess.m_type) { case DEV_OPEN: r = (*dp->dr_open)(dp, &mess); break; case DEV_CLOSE: r = (*dp->dr_close)(dp, &mess); break; case DEV_IOCTL: r = (*dp->dr_ioctl)(dp, &mess); break; case DEV_READ: case DEV_WRITE: r = do_rdwt(dp, &mess); break; case SCATTERED_IO: r = do_vrdwt(dp, &mess); break; default: r = EINVAL; break; } /* Clean up leftover state. */ (*dp->dr_cleanup)(); /* Finally, prepare and send the reply message. */ mess.m_type = TASK_REPLY; mess.REP_PROC_NR = proc_nr; mess.REP_STATUS = r; /* # of bytes transferred or error code */ send(caller, &mess); /* send reply to caller */ } }
Linux device drivers follow the Unix model, and thus export their interface as a set of functions:
struct file_operations { loff_t (*llseek) (struct file *, loff_t, int); ssize_t (*read) (struct file *, char *, size_t, loff_t *); ssize_t (*write) (struct file *, const char *, size_t, loff_t *); int (*readdir) (struct file *, void *, filldir_t); unsigned int (*poll) (struct file *, struct poll_table_struct *); int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long); int (*mmap) (struct file *, struct vm_area_struct *); int (*open) (struct inode *, struct file *); int (*flush) (struct file *); int (*release) (struct inode *, struct file *); int (*fsync) (struct file *, struct dentry *); int (*fasync) (int, struct file *, int); int (*check_media_change) (kdev_t dev); int (*revalidate) (kdev_t dev); int (*lock) (struct file *, int, struct file_lock *); };
Each device installs an instance of this structure in the kernel's device table; the entries point to functions that implement the particular device's functionality:
/* linux/drivers/block/hd.c */ static struct file_operations hd_fops = { NULL, /* lseek - default */ block_read, /* read - general block-dev read */ block_write, /* write - general block-dev write */ NULL, /* readdir - bad */ NULL, /* poll */ hd_ioctl, /* ioctl */ NULL, /* mmap */ hd_open, /* open */ NULL, /* flush */ hd_release, /* release */ block_fsync /* fsync */ }; __initfunc(int hd_init(void)) { if (register_blkdev(MAJOR_NR,"hd",&hd_fops)) { printk("hd: unable to get major %d for hard disk\n",MAJOR_NR); return -1; } blk_dev[MAJOR_NR].request_fn = DEVICE_REQUEST; read_ahead[MAJOR_NR] = 8; /* 8 sector (4kB) read-ahead */ hd_gendisk.next = gendisk_head; gendisk_head = &hd_gendisk; timer_table[HD_TIMER].fn = hd_times_out; return 0; }
Block devices (hardware) have some specific properties:
The mechanical nature of block devices means that the device driver has to handle potential errors:
Error: unable to include 'disk-scheduling.sgml': No such file or directory
The preceding discussion assumes disk blocks have addresses consisting of (cylinder, head, sector) triples.
Logical or Linear Block Addressing (LBA) assigns each sector a number, in sequence, starting with sector (0, 0, 0) = 0. Hence, the disk is modeled as an array of blocks (sectors).
We can still employ the same scheduling algorithms, if the disk assigns adjacent sectors adjacent numbers.
Most modern disk hardware reserves a few ``spare'' sectors in each cylinder, to be used if a sector in that cylinder becomes undreadable. The addressing logic automatically maps a given LBA to one of these spares, if the LBA refers to a bad block. This does not hurt performance, since accessing a sector in the same cylinder incurs at most some minor head swithing delay and rotational latency.
However, if all of the spares in a cylinder are used, the bad address may have to be mapped onto a spare in a different cylinder, perhaps many tracks away. This could destroy any efficiencies a scheduling algorithm might provide.
Buffering offers another opportunity for performance improvment. When a process requests a block, it moves through a sequence of buffers:
If one of these buffers already has a sector in it, waiting to be copied to the next buffer, the sequence must stop until space is freed.
To remedy this situation, we can add another buffer at each stage. Call these buffers A and B. Then,
This scheme is called ``double buffering.'' One buffer is used for incoming data, the other for outgoing data; their roles reverse when a copy cycle is completed.
The scheme can be generalized to N buffers, as required.
Character devices differ from block devices:
Character device drivers have to deal with these characteristics with
Character devices deal in unstructured streams of bytes, but user processes often like to do input or output a line at a time. Each character device ``line'' has an associated line discipline that can break up the input stream into discrete lines.
A line discipline is like a high level device driver, a layer that sits above a specific hardware device driver to provide input semantics to the device's i/o stream. One can combine line disciplines with device drivers to provide uniform semantics regardless of the underlying serial device.
A Unix line discipline has the following entry points:
Initialize the line discipline.
Close the line.
Read (up to) a line.
Write (up to) a line.
Set characteristics of the line discipline, like what control characters do.
Recieve character from device driver interrupt handler.
Transmission control.
Modem carrier transition.
The line discipline is responsible for interpreting control characters and implementing the required behavior:
+-----+ +----------------------------------------+ |count| / +--+-----------------------------------v-----------------+ +-----+ / | *| Sometimes, a great| |first*- +-|+-----------------------------------------------------+ +-----+ | |last*|- +v-+-----------------------------------------------------+ +-----+ \ | |notion. | \ +--+-----------------------------------------------------+ \ ^ +-----------+