I have this code
site = hxs.select("//h1[@class='state']")
mydata = site.select("string()").extract()
cleaned_mydata = re.sub(ur'(\s)\s+', ur'\1', mydata[0], flags=re.MULTILINE + re.UNICODE)
log.msg(str(mydata),level=log.ERROR)
log.msg(str(cleaned_mydata),level=log.ERROR)
The first output is
ERROR: [u'\r\n 212\r\n jobs containing php in xxxx \r\n ']
other output is
jobs containing php in xxxxxx
regex is also stripping the 212 numeric with it. how can i fix that