0

I am trying to clean up a CSV by using regex. I have accomplished the first part which extracts the regex pattern from the address table and writes it to the street_numb field. The part I need help with is removing that same pattern from the street field so I only end up with the following (i.e., Steinway St, 31 St, 82nd Rd, and 19th St) stored in the street field. Hence these values would be removed (-78, -45, -35, -54) from the street field.

b    street_numb     street            address              zipcode
1    246             FIFTH AVE         246 FIFTH AVE        11215
2    30 -78          -78 STEINWAY ST   30 -78 STEINWAY ST   11016
3    25 -45          -45 31ST ST       25 -45 31ST ST       11102
4    123 -35         -35 82ND RD       123 -35 82ND RD      11415
5    22 -54          -54 19TH ST       22 -54 19TH ST       11105

Sample Data (above)

import csv
import re
path = '/Users/darchcruise/Desktop/bldg_zip_codes.csv'
with open(path, 'rU') as infile, open(path+'out.csv', 'w') as outfile:
   fieldnames = ['b', 'street_numb', 'street', 'address', 'zipcode']
   readablefile = csv.DictReader(infile)
   writablefile = csv.DictWriter(outfile, fieldnames=fieldnames)
   for row in readablefile:
       add = re.match(r'\d+\s*-\s*\d+', row['address'])
       if add:
            row['street_numb'] = add.group()
            # row['street'] = remove re.string (add.group()) from street field
            writablefile.writerow(row)
       else:
            writablefile.writerow(row)

What code in line 12 (# remove re.string from row['street']) could be used to resolve my issue (removing -78, -45, -35, -54 from the street field)?

1
  • 1
    ...do you have a question? Commented Mar 8, 2016 at 14:48

1 Answer 1

1

You can use capturing group with findall like this

[x for x in re.findall("(\d+\s*(-\s*\d+\s+)?)((\w|\s)+)", row['address'])][0][0]-->gives street number 
[x for x in re.findall("(\d+\s*(-\s*\d+\s+)?)((\w|\s)+)", row['address'])][0][2]-->gives address
Sign up to request clarification or add additional context in comments.

9 Comments

That would not work in line 12. It would have to be something that looked like this: row['street'] = code. That way the results would be written to the file in the line below (line13).
isnt row[street] = [x for x in re.findall("(-\d+\s+)?((\w|\s)+)", "FIFTH ST")][0][1] working
Part of that line worked. But it is replacing all the street field with the name "Fifth St". I want to bring over the value from the address field (with the regex removed).
Should not be this: 6 , 25 -40, FIFTH ST, 25 -40 14TH STREET, 11102. It should be 6 , 25 -40, 14TH STREET, 25 -40 14TH STREET, 11102
can you give an example from the data given by you in your question
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.