Scholars,
Your assignment is to produce the top 10 words from decl.txt. The algorthim is this:
1.
for each word in the file
if the word does not exist in the dictionary
put the word into a dictionary with a value of 1
else
increment the value associated with the word
2.
invert the dictionary
so instead of word->count mappings, it will contain count->[list,of,words,with,this,count]
3.
sort the keys of the inverted dictionary
print out the words associated with the 10 highest values
NOTE: this is not the only way to do this in python, but I want you to have practice doing the dictionary inversion, sorting the keys, etc.
CODE FROM CLASS
1. The code I wrote in class…more or less, with comments
#I.O. library
import os
#open a file for reading
fp = open(”path/to/file/file.txt”, “r”)
#iterate through each line
for line in fp:
line = line.replace(”\n”,”")
line = line.replace(”,”, “”)
line = line.lower()
#line.split returns a list, so you can put it right into a loop
for word in line.split(” “):
print word
# of course, you’ll do more than print the word, you’ll add it to a dictionary with the proper count, etc.
HINTS AND ALTERNATIVES:
There is a faster way to read the file. You can just read it all out into one big string. That way you don’t have to go through the song and dance of processing each line. Just do this…
fp = open(”path/to/file/file.txt”, “r”)
fullText = fp.read()
fp.close()
fullText is now a string of the whole file.
Also here is a shortcut for stripping all the punctuation except the words….
import re
fullText = re.sub(”[^a-z' ]“,”",fullText.lower())
NOTE: there is a blank space after the apostrophe (’) and before the ] in the first argument. What this says is: “for every character in fullText.lower(), if the character is NOT a through z, an apostrophe, or a blank space, then replace it with an empty string.”
Finally, to make a list of all the words, it can be accomplished in basically one line thusly:
import os
import re
fp = open(”path/to/file.txt”, “r”)
allWords = re.sub(”[^a-z' ]“,”",fp2.read().lower()).split(” “)
fp.close()
Posted on May 19th, 2009 by Baker Franke
Filed under: Labs