Using Python 2.7, I'm grabbing some HTML from a website as strings and immediately decoding it into unicode. Because I need to know later where any decoding errors occurred, I thought it would be best to use errors="replace" to prevent exceptions from non-ASCII characters:
linkname = curlinkname.decode("utf-8", errors="replace")
In most cases, this replaces the problem character with a placeholder. However, when I run the code I am still getting an exception from this line on one particular character (ū):
UnicodeEncodeError: 'charmap' codec can't encode character u'\u016b' in position 1: character maps to <undefined>
What's going on?