COEN 177: I/O

Operating Systems -- spring, 2007

Prof. John Noll

Santa Clara University

Sun May 20 17:58:04 2007

1. Devices

1.1 Fundamentals

1.2 Types

Polled

Polled device controllers use the status register to indicate when the device needs attention. The operating system (or hardware) must periodically check this register.

Interrupt-driven

Interrupt-driven device controllers raise interrupts when the device needs attention. The operating system's (device driver's) interrupt handler then reads the device status register and takes appropriate action.

Direct Memory Access

Device controllers that use direct memory access can use the system's memory bus to transfer data directly between the system's memory and the device controller's data registers. The device controller sets its status register and raises an interrupt when the transfer is complete.

1.3 Memory Mapped I/O

Many architectures had explicit i/o operations as part of the CPU instruction set:


input device_address ; write ``read'' command to command register.
ouput device_address ; write ``write'' command to command register.
test  register, device_address  ; copy status register.
copy_in register, device_address, device_register ; read data register.
copy_out register, device_address, device_register ; write to data register.

Modern architectures arrange for the device registers to occupy addresses in the machines memory space, so special i/o instructions can be replace with memory operations:


mv device_reg, cpu_reg ; write command or data to device.
mv cpu_reg, device_reg ; read status or data from device.

2. Interrupts

Suppose we had a ``silicon compiler'' that would translate C code into a photomask for creating a chip. Following is the C code for the fetch-execute cycle of a CPU, with simple interrupt handling (from Nutt, Operating Systems, a Modern Perspective, p.85):


while (!halt) {
      IR = memory[PC];  /* Load instruction register w/ inst. at PC */
      PC++;             /* Increment program counter. */
      execute(IR);
      if (interrupt_req) {      /* Devices set interrupt_req to raise
              memory[0] = PC;      an interrupt. */
              PC = memory[1];   /* Jump to interrupt handler. */
      }
}

We could enhance this by allowing different handlers for different interrupts, by reserving a range of memory addresses to hold entry points to interrupt handlers; each device then sets the interrupt request flag to an integer identifying the device:


while (!halt) {
      IR = memory[PC];
      PC++;
      execute(IR);
      if (interrupt_req) {
              memory[0] = PC;
              PC = int_vector[interrupt_req];
      }
}

In both cases, the interrupt handler must save any data necessary to resume execution of the interrupted process.

The interrupt handling mechanism can be further enhanced to do some of this:


void save_state()
{
              memory[0] = PC;   /* Program Counter. */  
              memory[1] = SP;   /* Stack Pointer. */
              memory[2] = BR;   /* Memory image base register. */
              memory[3] = ER;   /* Memory image extent register. */
}             

while (!halt) {
      IR = memory[PC];
      PC++;
      execute(IR);
      if (interrupt_req) {
              save_state();
              PC = int_vector[interrupt_req];
      }
}

3. Device Drivers

Unix divides devices (and device drivers) into two broad categories: character and block. Character devices generally do i/o as a stream of characters. Block devices require i/o a block at a time.

Each device has a major number, indicating the type of the device, and a minor number, indicating the instance. The kernel maintains a table of device structures that points to drivers.

3.1 Structure

Unix device drivers are divided into two parts:

Top Half

Implements interface and maintains state of pending operations.

Bottom Half

Handles interrupts.

3.2 Interface

In the Unix world, IO is based on a file model. So the device drivers provide an abstraction that is close to this model.

open

Prepare the device.

close

Clean up; the caller is done.

read

Request input from device (character devices).

write

Request input to device (character devices).

select

Checks to see if read/write will succeed (character devices).

strategy

Schedule a read or write (block devices).

ioctl

Driver or device specific commands or data (character devices).

stop

Stop stream output (character devices).

dump

Dump memory image (block devices).

Minix

The Minix device driver interface is exported as a set of messages a driver can handle:

DEV_OPEN

Ready (``open'') the device for reading/writing.

DEV_CLOSE

Close device (clean up and free for another process to use).

DEV_READ

Read data from device.

DEV_WRITE

Write data to device.

DEV_SCATTERED_IO

Read/write a list of blocks from block device.

DEV_IOCTL

Set device-specific attributes.

Every Minix device driver has the following structure:


/* kernel/driver.c */

/*===========================================================================*
 *                              driver_task                                  *
 *===========================================================================*/
PUBLIC void driver_task(dp)
struct driver *dp;      /* Device dependent entry points. */
{
/* Main program of any device driver task. */

  int r, caller, proc_nr;
  message mess;

  init_buffer();        /* Get a DMA buffer. */


  /* Here is the main loop of the disk task.  It waits for a message, carries
   * it out, and sends a reply.
   */

  while (TRUE) {
        /* First wait for a request to read or write a disk block. */
        receive(ANY, &mess);

        caller = mess.m_source;
        proc_nr = mess.PROC_NR;

        switch (caller) {
        case HARDWARE:
                /* Leftover interrupt. */
                continue;
        case FS_PROC_NR:
                /* The only legitimate caller. */
                break;
        default:
                printf("%s: got message from %d\n", (*dp->dr_name)(), caller);
                continue;
        }

        /* Now carry out the work. */
        switch(mess.m_type) {
            case DEV_OPEN:      r = (*dp->dr_open)(dp, &mess);      break;
            case DEV_CLOSE:     r = (*dp->dr_close)(dp, &mess);     break;
            case DEV_IOCTL:     r = (*dp->dr_ioctl)(dp, &mess);     break;

            case DEV_READ:
            case DEV_WRITE:     r = do_rdwt(dp, &mess);     break;

            case SCATTERED_IO:  r = do_vrdwt(dp, &mess);    break;
            default:            r = EINVAL;                     break;
        }

        /* Clean up leftover state. */
        (*dp->dr_cleanup)();

        /* Finally, prepare and send the reply message. */
        mess.m_type = TASK_REPLY;
        mess.REP_PROC_NR = proc_nr;

        mess.REP_STATUS = r;    /* # of bytes transferred or error code */
        send(caller, &mess);        /* send reply to caller */
  }
}

Linux

Linux device drivers follow the Unix model, and thus export their interface as a set of functions:


        
struct file_operations {
        loff_t (*llseek) (struct file *, loff_t, int);
        ssize_t (*read) (struct file *, char *, size_t, loff_t *);
        ssize_t (*write) (struct file *, const char *, size_t, loff_t *);
        int (*readdir) (struct file *, void *, filldir_t);
        unsigned int (*poll) (struct file *, struct poll_table_struct *);
        int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
        int (*mmap) (struct file *, struct vm_area_struct *);
        int (*open) (struct inode *, struct file *);
        int (*flush) (struct file *);
        int (*release) (struct inode *, struct file *);
        int (*fsync) (struct file *, struct dentry *);
        int (*fasync) (int, struct file *, int);
        int (*check_media_change) (kdev_t dev);
        int (*revalidate) (kdev_t dev);
        int (*lock) (struct file *, int, struct file_lock *);
};

Each device installs an instance of this structure in the kernel's device table; the entries point to functions that implement the particular device's functionality:


/* linux/drivers/block/hd.c */

static struct file_operations hd_fops = {
        NULL,                   /* lseek - default */
        block_read,             /* read - general block-dev read */
        block_write,            /* write - general block-dev write */
        NULL,                   /* readdir - bad */
        NULL,                   /* poll */
        hd_ioctl,               /* ioctl */
        NULL,                   /* mmap */
        hd_open,                /* open */
        NULL,                   /* flush */
        hd_release,             /* release */
        block_fsync             /* fsync */
};

__initfunc(int hd_init(void))
{
        if (register_blkdev(MAJOR_NR,"hd",&hd_fops)) {
                printk("hd: unable to get major %d for hard disk\n",MAJOR_NR);
                return -1;
        }
        blk_dev[MAJOR_NR].request_fn = DEVICE_REQUEST;
        read_ahead[MAJOR_NR] = 8;               /* 8 sector (4kB) read-ahead */
        hd_gendisk.next = gendisk_head;
        gendisk_head = &hd_gendisk;
        timer_table[HD_TIMER].fn = hd_times_out;
        return 0;
}

4. Block Devices

Block devices (hardware) have some specific properties:

The mechanical nature of block devices means that the device driver has to handle potential errors:

4.1 Disk Access Scheduling

Error: unable to include 'disk-scheduling.sgml': No such file or directory

4.2 Logical/Linear Block Addressing

The preceding discussion assumes disk blocks have addresses consisting of (cylinder, head, sector) triples.

Logical or Linear Block Addressing (LBA) assigns each sector a number, in sequence, starting with sector (0, 0, 0) = 0. Hence, the disk is modeled as an array of blocks (sectors).

We can still employ the same scheduling algorithms, if the disk assigns adjacent sectors adjacent numbers.

Bad Blocks

Most modern disk hardware reserves a few ``spare'' sectors in each cylinder, to be used if a sector in that cylinder becomes undreadable. The addressing logic automatically maps a given LBA to one of these spares, if the LBA refers to a bad block. This does not hurt performance, since accessing a sector in the same cylinder incurs at most some minor head swithing delay and rotational latency.

However, if all of the spares in a cylinder are used, the bad address may have to be mapped onto a spare in a different cylinder, perhaps many tracks away. This could destroy any efficiencies a scheduling algorithm might provide.

4.3 Buffering

Buffering offers another opportunity for performance improvment. When a process requests a block, it moves through a sequence of buffers:

  1. The disk hardware copies the sector from the platter to the controller's internal buffer.
  2. If the controller supports DMA, the controller copies the sector from its buffer to the device driver's buffer in memory. If the controller does not support DMA, the device driver does the copying.
  3. The device driver copies the sector from its buffer to the caller's buffer.

If one of these buffers already has a sector in it, waiting to be copied to the next buffer, the sequence must stop until space is freed.

To remedy this situation, we can add another buffer at each stage. Call these buffers A and B. Then,

  1. The disk hardware copies a sector from the platter to the controller's internal buffer A.
  2. If the controller supports DMA, it can command the disk to read another sector into its buffer B, while the controller copies the first sector from its buffer A to the device driver's buffer B in memory.
  3. The device driver can be copying another sector from its buffer A to the caller's buffer, while the controller is copying data into the device driver's buffer B.

This scheme is called ``double buffering.'' One buffer is used for incoming data, the other for outgoing data; their roles reverse when a copy cycle is completed.

The scheme can be generalized to N buffers, as required.

5. Character Devices

Character devices differ from block devices:

Character device drivers have to deal with these characteristics with

5.1 Line Disciplines

Character devices deal in unstructured streams of bytes, but user processes often like to do input or output a line at a time. Each character device ``line'' has an associated line discipline that can break up the input stream into discrete lines.

A line discipline is like a high level device driver, a layer that sits above a specific hardware device driver to provide input semantics to the device's i/o stream. One can combine line disciplines with device drivers to provide uniform semantics regardless of the underlying serial device.

A Unix line discipline has the following entry points:

l_open()

Initialize the line discipline.

l_close()

Close the line.

l_read()

Read (up to) a line.

l_write()

Write (up to) a line.

l_ioctl()

Set characteristics of the line discipline, like what control characters do.

l_rint()

Recieve character from device driver interrupt handler.

l_start()

Transmission control.

l_modem()

Modem carrier transition.

Note the similarity to the device driver entry points: the device driver typically calls the corresponding line discipline entry point.

The line discipline is responsible for interpreting control characters and implementing the required behavior:

5.2 Character Input

Bottom Half

  1. User presses a key on keyboard. Keyboard raises an interrupt.
  2. Kernel interrupt handler (glue) calls device driver's interrupt handler d_rint().
  3. Device driver interrupt handler reads byte from device controller.
    1. Be sure controller's hardware parity check succeeded.
    2. If byte == BREAK call line discipline's l_rint() with Interrupt character; else
    3. call line discipline's l_rint() with input character.
  4. L_rint()
    1. Echos character to display if necessary.
    2. Append character to ``raw'' input queue.
    3. If character == line terminator, copy the raw input queue to the ``canonical'' input queue, making it available as an input line to a calling process.
    4. If a calling process is waiting for an input line (has made a read() system call), go to Top Half.
The kernel maintains a pool of i/o buffers called ``C-Blocks'' of 60 bytes; these blocks are chained together on ``C-lists'' to form the i/o queues.
+-----+   +----------------------------------------+
|count|  /  +--+-----------------------------------v-----------------+
+-----+ /   | *|                                   Sometimes, a great|
|first*-    +-|+-----------------------------------------------------+
+-----+       |
|last*|-     +v-+-----------------------------------------------------+
+-----+ \    |  |notion.                                              |
         \   +--+-----------------------------------------------------+
          \            ^
           +-----------+

Top Half

  1. User process makes read(fd, buf, size) system call.
  2. Kernel handles trap and calls device driver's d_read(device_num, uio struct).
  3. D_read() calls l_read() of line discipline associated with device_num.
  4. L_read()
    1. if data (lines) are available in the device line's canonical queue, copy data into caller's buffer (up to complete line), interpreting any special characters; return;
    2. else, put caller in blocked state waiting for input on device line.