Code is presented below. Runs with python 2 in Debian 9.
# -*- coding: utf-8 -*-
import requests
import bs4
# repairing invalid HTML
s = requests.get('http://vstup.info/2017/i2017i483.html')
tmp = s.text.replace("</td></tr></td></tr><tr><td>", "</td></tr><tr><td>")
bs = bs4.BeautifulSoup(tmp, "html.parser")
content = bs.find("div", {"id": "okrArea"}).find("table", {"id": "about"}).findAll("tr")
typ = content[1].findAll("td")[1].get_text() #ZVO type
print typ
print [typ]
It outputs this:
ТеÑ
нÑкÑм (ÑÑилиÑе)
[u'\xd0\xa2\xd0\xb5\xd1\x85\xd0\xbd\xd1\x96\xd0\xba\xd1\x83\xd0\xbc (\xd1\x83\xd1\x87\xd0\xb8\xd0\xbb\xd0\xb8\xd1\x89\xd0\xb5)']
- Why do variable print output differs from this variable in list?
- How to get correct value from web-page
Технікум (училище)
In interactive python it can be get from backslashed codes in this way
>>> print '\xd0\xa2\xd0\xb5\xd1\x85\xd0\xbd\xd1\x96\xd0\xba\xd1\x83\xd0\xbc (\xd1\x83\xd1\x87\xd0\xb8\xd0\xbb\xd0\xb8\xd1\x89\xd0\xb5)'.decode('utf8')
Технікум (училище)