CSC374 Nov01

slide version

single file version

Contents

  1. Memory Mapping
  2. Linux I/O
  3. fopen, fclose
  4. strcpy, gets, sprintf
  5. fgets, strncpy, snprintf
  6. Linux I/O: open
  7. FILE pointers and file descriptors
  8. Opening a File with Linux I/O open
  9. Creating a new file
  10. Appending
  11. The Mode Parameter
  12. The umask
  13. Example
  14. The Linux I/O read function
  15. The write Function
  16. Short reads
  17. Short reads Summary
  18. A readn wrapper function
  19. Read and Signals
  20. Buffering
  21. Getting the size of a File
  22. Kernel Data Structures for Files
  23. Descriptor table
  24. System File Table
  25. The dup2 Function
  26. 11.1
  27. 11.2
  28. 11.3
  29. 11.4
  30. 11.5
  31. Problem 1
  32. Problem 2
  33. Problem 3

Memory Mapping[1] [top]

The kernel has to initialize the virtual address space of a process, creating the .text, .bss, .data, heap, and stack segments.

If a process creates a thread, a new segment must be added to the virtual address space for a stack for the thread.

The mmap system call can create new mappings; that is, can add a new segment at some virtual address space of the process.


#include <unistd.h>
#include <sys/mman.h>

void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset);
int munmap(void *start, size_t length);

 prot: PROT_EXEC, PROT_READ, PROT_WRITE, PROT_NONE
 flags: MAP_PRIVATE, MAP_SHARED, MAP_ANON

MAP_PRIVATE | MAP_ANON could be used to map an area of virtual memory for a stack segment for a new thread.

Copy-on-write is also used in the implementation of fork.

Linux I/O[2] [top]

fopen, fclose[3] [top]

The C standard i/o library defines a type: FILE

No one every pays much attention to exactly how this type is defined in terms of other C basic types. You don't need to know.

The fopen function is used to read in information about a named file and prepare it for reading and/or writing.

This function returns a value which is of type pointer to FILE.

The fprintf or fscanf functions for writing to or reading from the file just pass this pointer, rather than the file name.

        FILE *fp;
        fp = fopen("foo.txt", "r");
        if ( fp == NULL ) {
           printf("Unable to open input file foo.txt\n");
           exit(1);
        }
        const int N = 128;
        char buf[N];

        // Read a line from foo.txt
        fgets(buf, N, fp);

        // or read an integer from the next line of foo.txt
        int k;
        int numcnv;
        numcnv = fscanf(fp, "%d", &k);
        if ( numcnv != 1 ) {
           printf("Unable to read an integer from foo.txt");
        }

        fclose(fp); // fp is no longer associated with foo.txt
      

strcpy, gets, sprintf [4] [top]

Many standard library i/o functions write data into a string and make programs that use them susceptible to buffer overflow attacks:

        const int N = 5;
        char b[N];

        /**
        * 1. Not enough room in b for "Hello, World!"
        */

        strcpy(b, "Hello, World!");

        /**
        * 2. Reads an input line and stores it in b (discarding the
        *    newline, but adding a null byte). Not enough room
        *    in b for input lines longer than 4 characters.
        */

        gets(b);

        /**
        * 3. Converts time in ticks since Januarty 1, 1970 (or the
        *    'epoch' starting date) to a string representation of
        *    the current date and time and stores it in b. E.g.,
        *    "Thu May 14 09:29:24 2009" copied to b.
        *       */
        time_t ticks = time(0);  // current time in ticks since epoch
        sprintf(b, "%.24s", ctime(&ticks));

      

fgets, strncpy, snprintf[5] [top]

Each of these alternate versions of i/o functions has an extra parameter for the size of the array.

The function will not write more than this limit into the array.

Using these functions (correctly) prevents overflowing the character arrays.

        const int N = 5;
        char b[N];

        /**
        * 1. Not enough room in b for "Hello, World!"
        *    but only 4 characters will be copied and a null 
        *    byte added.
        */

        strncpy(b, "Hello, World!", N-1);
        b[N-1] = '\0';
        

        /**
        * 2. fgets reads at most the specified number of bytes minus 1
        *    from the input line and stores it in b and adds a null
        *    byte, but stops reading after a newline is read. The
        *    newline is stored if it is read. The third parameter
        *    is of type FILE *, pointer to FILE. stdin, stdout, and
        *    stderr are of this type. The fopen function is used to
        *    open a named file; it returns a FILE * for the opened file.
        */

        fgets(b, N, stdin);

        /**
        * 3. Converts time in ticks since Januarty 1, 1970 (or the
        *    'epoch' starting date) to a string representation of
        *    the current date and time and stores it in b. E.g.,
        *    "Thu May 14 09:29:24 2009" copied to b, but copies 
        *    at most the specified number of characters - 1 and 
        *    then adds a null byte. So only "Thu " will be copied
        *    if N is 5.
        */
        time_t ticks = time(0);  // current time in ticks since epoch
        snprintf(b, N, "%.24s", ctime(&ticks));

      

Linux I/O: open[6] [top]

To read a file using Linux I/O functions, the file must first be opened.

Instead of fopen, the Linux open function is used.

The open function returns a value of type int, instead of the FILE * type returned by the C standard library fopen function.

The integer returned is called a file descriptor.

When a C program runs, 3 files are opened by default:

FILE pointers and file descriptors[7] [top]

The C standard I/O library functions are implemented using the Linux I/O functions.

So each of the 3 default open files have both an associated FILE pointer and a file descriptor.

File FILE * file descriptor
standard input stdin 0
standard output stdout 1
standard error stderr 2

That is, stdin, stdout, and stderr are defined values in the header files and are each of type FILE *.

When a program runs, these are associated with the specified files (or devices).

Opening a File with Linux I/O open[8] [top]

Open a file for reading

        int fd;
        fd = open("foo.txt", O_RDONLY, 0);
      

Open an existing file for writing (current data is first truncated to 0 bytes) with permissions

        int fd;
        fd = open("foo.txt", O_WRONLY, 0666);
      

Creating a new file[9] [top]

To open a new file for writing use the O_WRONLY flag, but bitwise or it with O_CREAT (no E at the end).

Open foo.txt as a new file (if it doesn't already exist) with permission 0600 (read/write for owner only):

        int fd;
        fd = open("foo.txt", O_WRONLY | O_CREAT, 0600);
      

Appending[10] [top]

To open an existing file without destroying its contents, you can open it for appending additional bytes:

        int fd;
        fd = open("foo.txt", O_WRONLY | O_APPEND, 0);
      

The third parameter of 0, will not change the permissions of the existing file.

In fact the third parameter can be omitted for reading and appending existing files.

The Mode Parameter[11] [top]

The 3rd parameter, 0666, is also called the mode, but as noted above, it specifies the file access permissions.

The leading 0 means 0666 is interpreted as an octal number (base 8).

To represent one of the 8 digits in octal only 3 bits are needed. Converting octal to binary means each octal digit becomes 3 bits.

In binary 0666 would be

        6   6   6   (octal)
        110 110 110 (binary)
        rw- rw- rw- (permissions)
      

The umask[12] [top]

Although the open example specified 0666, the actual file permissions are calculated with a mask that may restrict access further.

The mask is called the umask and there is one for each user.

The umask can be inspected and/or changed at the prompt:

        $ umask
          0022  (output example of the umask command)
      

Actual file permissions are calculated like this:

        actual_permissions = specified_permissions & ~umask
      

Example[13] [top]

For a umask of 0022 and requested_permissions of 0660 (rw-rw-rw-), actual_permissions are calculated as 0640 (rw-r--r--)

        actual_permissions = specified_permissions & ~umask  
        umask                 = 0022 = 000 010 010

       ~umask                 = 0755 = 111 101 101
        requested_permissions = 0666 = 110 110 110
umask & requested_permissions = 0644 = 110 100 100
      

So for umask = 0022, the actual permissions are 0644 (rw-r--r--), not 666 (rw-rw-rw-). Group and Others will be able to read from, but not write to this file.

The Linux I/O read function[14] [top]

The read Function:

      nrd = read(fd, buf, N)
      

At end of file read returns 0.

    1	#include <fcntl.h>  // for open and read
    2	int main()
    3	{
    4	  int fd;
    5	  
    6	  char ch;
    7	  size_t nrd;
    8	
    9	  fd = open("foo.txt", O_RDONLY);
   10	  if ( fd < 0 ) {
   11	    printf("Could not open file foo.txt for input.\n");
   12	    exit(1);
   13	  }
   14	
   15	  while( (nrd = read(fd, &ch, 1)) > 0 ) {
   16	    printf("%c", ch);
   17	  }
   18	  return 0;
   19	}

The write Function[15] [top]

The write Function:

      write(fd, addr, N)
      

The write function returns the number bytes written, but this is mostly ignored since it is equal to N.

    1	#include <fcntl.h> // for open, read, write, and close
    2	int main()
    3	{
    4	  int fd;
    5	  const int N = 8;
    6	  char buf[N];
    7	  size_t nrd;
    8	
    9	  fd = open("foo.txt", O_RDONLY);
   10	  if ( fd < 0 ) {
   11	    printf("Could not open file foo.txt for input.\n");
   12	    exit(1);
   13	  }
   14	
   15	  while( (nrd = read(fd, buf, N)) > 0 ) {
   16	    write(1, buf, N);
   17	  }
   18	  return 0;
   19	}

Short reads[16] [top]

In the previous example with N = 8, read will read 8 bytes each time it is called

	read(fd, buf, N)
      

except:

So the return value of read will be N until possibly the next to last call when it will be short and return a smaller value of actual bytes read.

The read and write functions can also be used to read/write data accross a network.

Because of network delays, a read operation may return a short read (fewer than the requested number of bytes) even though more bytes will be/are sent.

In this case it is necessary to keep calling read to get more bytes until

However, for a network connection, 0 is returned ("end of file") only when the connection is closed at the end that is writing.

Short reads Summary[17] [top]

Short reads occur

A readn wrapper function[18] [top]

The function below can will only return a short read when the fewer than n bytes remain.

This function is slightly shorter than the one in the csapp.c file from the text (and from the text web site) as it doesn't deal with signals.

    1	ssize_t rio_readn(int fd, void *usrbuf, size_t n)
    2	{
    3	    size_t nleft = n;
    4	    ssize_t nread;
    5	    char *bufp = usrbuf;
    6	
    7	    while (nleft > 0) {
    8		if ((nread = read(fd, bufp, nleft)) == 0) {
    9		    break;              /* EOF */
   10	        }
   11		nleft -= nread;
   12		bufp += nread;
   13	    }
   14	    return (n - nleft);         /* return >= 0 */
   15	}

Read and Signals[19] [top]

If a signal is caught while a program is in the read function, on some systems read may return -1 and set errno.

On other systems, read may restart after the signal handler returns.

To handle signals in both cases, the full version of rio_readn checks for this and if

	errno == EINTR
      

the rio_readn function, arranges to keep reading until the requsted number of bytes have been read or EOF is reached.

Buffering[20] [top]

The C standard i/o functions create buffers in your linked program so that if you write a program that reads a file 1 character at a time, it doesn't ask the Linux file system for 1 character, but rather it fills an internal buffer.

Then each request for a single character just retrieve the next character from the buffer in your code until the buffer is empty.

When the buffer is empty, another request is made behind the scenes to Linux I/O to fill the buffer.

The csapp.c similarly creates buffers (a struct with a character array) for the read and provides buffered version of rio_readn, with no short reads except for the next to last read and the last EOF read.

      rio_t rb; // buffer struct
      int fd;   // for file descriptor
      ...
      const int N = 8;
      char buf[N];
      int nrd;

      rio_readinitb(&rb, fd);
      ...
      nrd = rio_read(&rb, buf, N);  
    

Getting the size of a File[21] [top]

We already saw with the mmap function how to do this using the fstat function.

    1	#include <unistd.h>
    2	#include <sys/stat.h>
    3	
    4	int main()
    5	{
    6	   struct stat file_stat;
    7	
    8	   stat("foo.txt", &file_stat);
    9	
   10	   printf("File foo.txt has %u bytes\n", file_stat.st_size);
   11	
   12	
   13	}

The stat function returns -1 for error.

Instead of using the stat function with the file name, there is a function, fstat that can be used if you have already opened the file and have a file descriptor, fd for the file.


    8	   fstat(fd, &file_stat);
    

Kernel Data Structures for Files[22] [top]

Descriptor table[23] [top]

File descriptors are just subscripts for this table.

Entries are either NULL or else a pointer into the System File table.

System File Table[24] [top]

An entry in the system file table is made for each file that is opened by some current process.

An entry contains

The reference count will be 2 if a parent has an open file and forks a child.

The dup2 Function[25] [top]

	int dup2(int fdA, int fdB);
      

11.1[26] [top]

    1	int main()
    2	{
    3	  int fd1, fd2;
    4	
    5	  fd1 = open("foo.txt", O_RDONLY, 0);
    6	  close(fd1);
    7	  fd2 = open("baz.txt", O_RDONLY, 0);
    8	  printf("fd2 = %d\n", fd2);
    9	  exit(0);
   10	}

11.2[27] [top]

foobar.txt has 6 characters: "foobar".

    1	int main()
    2	{
    3	  int fd1, fd2;
    4	  char c;
    5	
    6	  fd1 = open("foobar.txt", O_RDONLY, 0);
    7	  fd2 = open("foobar.txt", O_RDONLY, 0);
    8	  read(fd1, &c, 1);
    9	  read(fd2, &c, 1);
   10	  printf("c = %c\n", c);
   11	  exit(0);
   12	}

11.3[28] [top]

foobar.txt contents 6 characters: "foobar"

    1	int main()
    2	{
    3	  int fd;
    4	  char c;
    5	
    6	  fd = open("foobar.txt", O_RDONLY, 0);
    7	  if ( fork() == 0 ) {
    8	    read(fd, &c, 1);
    9	    exit(0);
   10	  }
   11	  wait(0);
   12	  read(fd, &c, 1);
   13	  printf("c = %c\n", c);
   14	  exit(0);
   15	}

11.4[29] [top]

How would you use dup2 to redirect standard input to descriptor 5?

11.5[30] [top]

    1	int main()
    2	{
    3	  int fd1, fd2;
    4	  char c;
    5	
    6	  fd1 = open("foobar.txt", O_RDONLY, 0);
    7	  fd2 = open("foobar.txt", O_RDONLY, 0);
    8	  read(fd2, &c, 1);
    9	  dup2(fd2, fd1);
   10	  read(fd1, &c, 1);
   11	  printf("c = %c\n", c);
   12	  exit(0);
   13	}

Problem 1[31] [top]

If a program opens just one file, what file descriptor will it have?

Problem 2[32] [top]

If a program

what file descriptor will the second file have?

Problem 3[33] [top]

How could you write code to create a child process, but arrange to redirect standard input for the child to come from a file named foo.txt?

That is, the bash shell has to do this if you type:

	$ prog < foo.txt