A crazy command line

Here is a piece of code from unix.stackexchange which counts the number of occurrences of words in a text file.

sed -e 's/[^[:alpha:]]/ /g' vader.txt | tr '\n' " " |  tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl 

It basically does the following:

  1. Substitute all non alphanumeric characters with a blank space.
  2. All line breaks are converted to spaces.
  3. Reduces all multiple blank spaces to one blank space
  4. All spaces are now converted to line breaks. Each word in a line.
  5. Translates all words to lower case to avoid ‘Hello’ and ‘hello’ to be different words
  6. Sorts the text
  7. Counts and remove the equal lines
  8. Sorts reverse in order to count the most frequent words
  9. Add a line number to each word in order to know the word posotion in the whole

Now what it really does is show the power of Unix command line utilities.

Here’s the input file vader.txt:

I’ve been waiting for you Obi-Wan. We meet again, at last.
The circle is now complete; when I left you, I was but the
learner, now I am the master. Only a master of evil, Darth.

and the output:

1 4 i
2 3 the
3 2 you
4 2 now
5 2 master
6 1 when
7 1 we
8 1 was
9 1 wan
10 1 waiting
11 1 ve
12 1 only
13 1 of
14 1 obi
15 1 meet
16 1 left
17 1 learner
18 1 last
19 1 is
20 1 for
21 1 evil
22 1 darth
23 1 complete
24 1 circle
25 1 but
26 1 been
27 1 at
28 1 am
29 1 again
30 1 a

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s