I have a text file, its size is more than 200 MB. I want to read it and then want to select 30 most frequently used words. When i run it, it give me error. The code is as under:-
import sys, string
import codecs
from collections import Counter
import collections
import unicodedata
with open('E:\\Book\\1800.txt', "r", encoding='utf-8') as File_1800:
for line in File_1800:
sepFile_1800 = line.lower()
words_1800 = re.findall('\w+', sepFile_1800)
for wrd_1800 in [words_1800]:
long_1800=[w for w in wrd_1800 if len(w)>3]
common_words_1800 = dict(Counter(long_1800).most_common(30))
print(common_words_1800)
Traceback (most recent call last):
File "C:\Python34\CommonWords.py", line 14, in <module>
for line in File_1800:
File "C:\Python34\lib\codecs.py", line 313, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position
3784: invalid start byte