1

I'm trying to use the solution code given in the following link: Unicode Tagging in Python NLTK

In the solution given by omerbp:

from nltk.corpus import indian
from nltk.tag import tnt

train_data = indian.tagged_sents('hindi.pos')
tnt_pos_tagger = tnt.TnT()
tnt_pos_tagger.train(train_data) #Training the tnt Part of speech tagger with hindi data

print tnt_pos_tagger.tag(nltk.word_tokenize(word_to_be_tagged))

I'm getting the following error:

'SyntaxError: Non-ASCII character '\xe0' in file q12.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details' in line 1.

2

1 Answer 1

1

Add these two lines on the top of your file:

#!/usr/bin/python
# -*- coding: utf-8 -*-

They will instruct the interpreter to encode every charater as UTF-8 instead of ASCII.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.