I am trying to write a small bit of code using the regex module that will remove a portion of a url from a .csv file and return the selected chunk as output. if the section ends with .com/go/, I would like it to return the content AFTER "go". Here's the code:
import csv
import re
with open('rtdata.csv', 'rb') as fhand:
reader = csv.reader(fhand)
for row in reader:
url=row[6].strip()
section=re.findall("^http://www.xxxxxxxxx.com/(.*/)", url)
if section==re.findall("^go.*", url):
section=re.findall("^http://www.xxxxxxxxx.com/go/(.*/)", url)
print url
print section
and here's some sample input-output:
- Example 1
- input:
http://www.xxxxxxxxx.com/go/news/videos/ - output:
news/videos
- input:
- Example 2
- input:
http://www.xxxxxxxxx.com/new-cars/ - output:
new-cars
- input:
what am I missing here?
