CSC 373: Computer Systems 1 Answers to Makeup Homework You can use this in place of a missed homework or substitute for one already done. In the latter case, I'll grade and substitute it for any homework with a lower grade. 1. When I execute the program #include int main() { int n; scanf("%d\n", n); return 0; } I get a "segmentation fault," which results from an attempt to access some inappropriate area of memory. The problem occurs in the scanf: the second argument should be &n rather than n. Explain why this mistake should cause the program to abort on the scanf execution. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Answer: In this example, scanf expects the second argument to be the address (in this case, a stack address because n is a local variable) of where to store the scanned input. So the statement should be scanf("%d\n", &n); As is, the (arbitrary) contents of n are taken as an address, which is likely invalid. This explains the abort: my program tries to access an invalid address. 2. Document the function mystery. The documentation should explain what each line does and also give a general overview of what mystery does, that is, how mystery computes the return value from the two arguments. #include int mystery(int n1, int n2) { unsigned char* ptr1 = (unsigned char*) &n1; unsigned char* ptr2 = (unsigned char*) &n2; int ret_val; unsigned char* ptr3 = (unsigned char*) &ret_val; ptr2 += sizeof(int) - 1; ptr3 += sizeof(int) - 1; int i; for (i = 0; i < sizeof(int); i++) { *ptr3 = *ptr2; ptr3--; ptr2--; } ptr3++; *ptr3 = *ptr1; return ret_val; } ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Answer: To visualize what's happening, I invoked mystery as follows int n1 = 0x11223344; int n2 = 0xaabbccdd; int n3 = mystery(n1, n2); and used show_bytes for output. Here's the output: n1: 44 33 22 11 n2: dd cc bb aa n3: 44 cc bb aa So mystery copies the leftmost byte of n1 into ret_val (which is stored in n3) after having copied the rightmost three bytes of n2 into ret_val. Here's the detail: int mystery(int n1, int n2) { unsigned char* ptr1 = (unsigned char*) &n1; unsigned char* ptr2 = (unsigned char*) &n2; int ret_val; unsigned char* ptr3 = (unsigned char*) &ret_val; /* At this point, ptr1 points to either the least significant byte (on an LE machine) or the most significant byte (on a BE machine) in n1; and ptr2 does the same with respect to n2 and ptr3 does the same with respect to local variable ret_val. The next two statements move ptr2 3 bytes into n2 and ptr3 3 bytes into ret_val. The bytes to which ptr1, ptr2, and ptr3 point will be called the "addressed byte." */ ptr2 += sizeof(int) - 1; ptr3 += sizeof(int) - 1; int i; /* The loop copies 3 bytes from n2 into retval. The byte that is not copied is the addressed byte in both, that is, either the LSB or the MSB depending on whether the platform is LE or BE. */ for (i = 0; i < sizeof(int); i++) { *ptr3 = *ptr2; ptr3--; ptr2--; } /* In the for loop above, ptr3 is decremented in each iteration. In the last iteration, the decrement causes ptr3 to point outside the storage for ret_val; hence, ptr3 is incremented by 1 byte so that it now points to the addressed byte in ret_val. The addressed byte in n1 is then copied into ret_val by using the corresponding pointers ptr1 and ptr3. */ ptr3++; *ptr3 = *ptr1; return ret_val; } 3. For the floating-point number 5.1, show the layout for a C float on the assumption that C follows the IEEE single-precision standard. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Answer: Here's 5.1 (decimal) in binary taken to nine places: 101.000110011 To get the implicit leading 1, we move the binary point two places left 1.01000110011 so that the exponent is 2. Since the bias is -127, the biased exponent is 129 (that is, 129 - 127 = 2). The fractional field is then 01000110011 padded with zeros. The sign is 0 for non-negative. Here's the layout: sign exponent significand \ \ \ 0 00000010 01000110011000000000000 4. Consider these variable declarations int x = 1; int y = 2; unsigned int p = (unsigned int) x; unsigned int q = (unsigned int) y; unsigned int ans = ((~x * y) + (p * q)) == -y; Is ans true regardless of the initial values for x and y? Explain. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Answer: Yes, the expression is always true. What's tricky, of course, is that there are mixed types: x and y and signed ints, whereas p and q are unsigned ints. Two preliminary points may be useful. First, arithmetic operations such as + and * work the same at the bit level on signed and unsigned ints. For instance, consider the code segment signed int x = 0xffffffff; /* -1 */ unsigned int p = 0xffffffff; /* 4,294,967,295 */ printf("%u\t%u\n", x * x, p * p); /* 1 in both cases */ Accordingly, in the problem, we could replaced (p * q) with (x * y). The equality then becomes (~x * y) + (x * y) == -y In different terms, the contrast between the signed ints, x and y, and the unsigned ints, p and q, is just a distraction. In mixed arithmetic operations (e.g., (~x * y) + (p * q)), the compiler will promote the signed value (~x * y) to an unsigned value before doing the addition. The same holds for the comparison == with -y, which will be promoted to unsigned in order to compare it with the unsigned value (~x * y) + (p * q). Semantically, integer multiplication is just repeated addition. For instance, p * q is either p added to itself q times or q added to itself p times because multiplication "commutes": p * q == q * p. Let's take a simple, 4-bit example to illustrate what's going on, making everything unsigned for simplicity but without loss of generality because, as noted above, at the bit level the integer operations work the same for signed and unsigned values: x = 13 ;; 1101 in binary y = 3 ;; 0011 in binary ~x = 2 ;; 0010 ~x * y as repeated addition: 0011 + 0011 ---- 0110 ;; 6 in decimal x * y is 13 * 3 = 39 in decimal, which is 100111 in binary; but we're working in 4 bits so, in effect, there's a 2-bit overflow. So we truncate to the 4-bit string 0111. Now we add: (~x * y) == 0110 (x * y) == 0111 ---------------- 1101 Now we negate y (-y). Recall that -y is shorthand for ~y + 1: ~y == 1100 + 1 ---- 1101 Here's a printout for a slightly richer example: x == 9 y == 11 ~x == 4294967286 -y == 4294967285 (~x * y) == 4294967186 (x * y) == 99 (~x * y) + (x * y) == 4294967285 Note that ~x * y is exactly x * y "off" from -y, which in turn is ~y + 1. If you're bored, you can work these out at the bit level as well. Algebraically, we can go at it this way: In a signed it, ~x is -x - 1 (and we already know that p * q can be replaced with x * y). So ((-x - 1) * y) + (x * y) = -xy - y + xy = -y where -xy is shorthand for -x * y. 5. Here is a short code segment with three lines of output. Explain how the second and third lines of output are related. In other words, how do we get from 305419896.000000 to 286331144.000000. int x = 0x12345678; printf("%d\n%f\n%f\n", x, (float) x, (float) ((x << 4) - x)); /* output: 305419896 305419896.000000 286331144.000000 */ ;;;;;;;;;;;;;;;;;;;;;;; Answer: Let's work from the hex to the binary and then to the floating-point representation. I'll use BE for clarity. I'm assuming float f = (float) x; x: 12 34 56 78 f: 4d 91 a2 b4 12 34 56 78 x: 0001 0010 0011 0100 0101 0110 0111 1000 f: 4d 91 a2 b4 0100 1101 1001 0001 1010 0010 1011 0100 f's format IEEE 754 format is: sign exponent significand | | | 0 10011011 00100011010001010110100 The unbiased exponent is 155; hence, the effective (biased) exponent is 155 - 127 = 28. So the floating-point value is 1.00100011010001010110100 * 2**28 which is 305,419,896 as an integer. After the left shift and the subtraction, the floating-point value in hex is 12 34 56 78 x: 0001 0010 0011 0100 0101 0110 0111 1000 4e cb 97 53 f: 0100 1110 1100 1011 1001 0111 0101 0011 Here's the breakdown into IEEE 754 format: sign exponent significand | | | 0 10011101 10010111001011101010011 The unbiased exponent is 157; hence, the biased expondent is 157 - 127 = 30. So the floating-point valude is 1.10010111001011101010011 * 2**30 6. In modern cache design, a major debate is over the relative merits of direct-mapped and set-associative caches. As noted in class, the trend is towards set-associative design for L1 but direct-mapped for L2 and other off-chip caches. Present the pros and cons for each type of cache, being as specific as possible about their relative strengths and weaknesses. The key issue here involves the tradeoffs: there are good points and bad points for each. ;;;;; Answer: The major "pro" for a direct-mapped cache is that, with just one line per set, there is by definition just one tag per set. So the directory size is N * 1 entries, where N is the number of sets. This is the minimal number of tags per set, as every cache needs a tag as a lookup key for line; hence, every cache set has at least 1 line and, therefore, at least 1 tag. The "con" of a K-way set-associative cache, where K > 1, is directly related to this point: the directory size is now N * K, or K times as big as the directory for a direct-mapped cache. In effect, for K > 1, the overhead of a K-way set-associative is K times the overhead of a direct mapped cache with the same number of sets. It's not unusual nowadays to have K values of, say, 128 and up. So a K-way SA could have an overhead of 128 or more than a DM with the same number of sets. The serious "con" of a direct-mapped cache is the conflict miss. If two virtual addresses map to set S, then the two compete, in effect, for a single line in S and one displaces the other if the two addresses are repeatedly referenced. In K-way for K > 1, by contrast, K different lines can coexist at the same time in set S. The key issue, then, is whether the expected (and perhaps significant) reduction in conflicts that favors with K-way set associative over direct mapped is worth the K-fold increase in directory size that is the price. The current answer, for L1 caches, is "yes." 7. Here's a C program. Please document where indicated. In particular, explain how each pointer expression (e.g., p[i] is a pointer expression, as is *ptr++) works. #include #include /* for isalpha, which tests whether an character code is for an alphabetic character */ const int n = 60; /* document */ /* This version of comp causes any array of chars to be sorted in descending rather than ascending order. Parameter v1 points to the "first" character that qsort needs to compare against the "second" character to which v2 points. If *v2 > *v1, then comp returns a positive integer, which signals to qsort that v2 precedes v1. If *v2 < *v1, then comp returns a negative integer, which signals to qsort that *v1 precedes *v2. If *v2 == *v1, then comp returns 0, which signals to qsort that *v2 and *v1 are equal with respect to sorted order. */ int comp(const void* v1, const void* v2) { return (*(const unsigned char*) v2) - (*(const unsigned char*) v1); } /* document */ /* print out the characters in an array, separating with blanks. add a newline at the end. */ void dump(unsigned char* p, unsigned int n) { unsigned int i; for (i = 0; i < n; i++) printf("%c ", p[i]); printf("\n"); } int main() { /* 'A' is 65 in either ASCII or Unicode. The ASCII codes are, in fact, preserved in Unicode. For the uppercase letters, the codes are consecutive. */ unsigned char c = 'A'; /* n is the number of uppercase letters, 26 */ unsigned int n = 'Z' - 'A' + 1; /* these two statements are equivalent to a single calloc statement. From the documentation on calloc, figure out what the appropriate calloc statement would be and replace these two with the appropriate calloc statement. unsigned char* ptr = malloc(2 * n); bzero(ptr, n); */ unsigned char* ptr = calloc(2 * n, sizeof(unsigned char)); unsigned char* base = ptr; /* document from here until the end: give the output as well */ /* fill the array with 'A', 'B',...,'Z'. Halt on any non-alphabetic character. */ while (isalpha(c)) *ptr++ = c++; unsigned int k = n / 2; /* half the array size, 13 */ unsigned char* p1 = malloc(k); /* p1 points to 13 char cells */ unsigned char* p2 = malloc(k); /* p2 points to 13 char cells */ memcpy(p1, base, k); /* copy into p1 the first k chars in base. In effect, p1 now points to an array that holds 'A' thru 'M', in that order*/ memcpy(p2, base + k, k); /* p2 now points to an array that holds 'N' thru 'Z', in that order */ qsort(p1, k, 1, comp); /* sort the p1 array */ qsort(p2, k, 1, comp); /* sort the p2 array */ dump(p1, k); /* print the first array */ dump(p2, k); /* print the second */ /* the output is: M L K J I H G F E D C B A Z Y X W V U T S R Q P O N */ return 0; }