More hints for #6 on assignment 1
Start by pseudo-coding it:
Write out in words what you need to do to solve the problem.
What is the goal state? (an ordered list of the words listing
how many times each occurs, or the top of that list) What is
the starting point or problem state? (the original file). How
could you get there?
You may need to work backwards:
- Need list of words and how many times each occurs: uniq
can do that.
- need all the words on one line each, with all the
occurrences of each word together to give to uniq. sort can
put all the occurrences on adjacent lines.
- Need to get one word on each line first. How could I do
that? If I could change everything that is not a letter (a-z
or A-Z) to a line break, each word would be on a separate
line.
So:
- change all non-alphabetic characters to newline
characters (using tr)
(this: '\n' is how you can specify a newline character)
(Here is a way to change non-alphabetic characters to
newlines: tr -sc '[:alpha:]' '\n'
In speaking to some of you I incorrectly said this would do
it: tr -s '[^a-zA-Z]' '\n'
I forgot that the tr command uses -c to mean "everything
except these" rather than the ^ as some other commands do.
Oops.)
- sort the resulting list of words
- count how many times each word occurs (using uniq)
- sort the resulting list to find the one that occurred the
most
There are probably several ways to accomplish each of these
steps, and step 1 could be accomplished in one command or by
stringing 2 or more commands together.