Remove regex pattern from string and store in csv

Question

I am trying to clean up a CSV by using regex. I have accomplished the first part which extracts the regex pattern from the address table and writes it to the street_numb field. The part I need help with is removing that same pattern from the street field so I only end up with the following (i.e., Steinway St, 31 St, 82nd Rd, and 19th St) stored in the street field. Hence these values would be removed (-78, -45, -35, -54) from the street field.

b    street_numb     street            address              zipcode
1    246             FIFTH AVE         246 FIFTH AVE        11215
2    30 -78          -78 STEINWAY ST   30 -78 STEINWAY ST   11016
3    25 -45          -45 31ST ST       25 -45 31ST ST       11102
4    123 -35         -35 82ND RD       123 -35 82ND RD      11415
5    22 -54          -54 19TH ST       22 -54 19TH ST       11105

Sample Data (above)

import csv
import re
path = '/Users/darchcruise/Desktop/bldg_zip_codes.csv'
with open(path, 'rU') as infile, open(path+'out.csv', 'w') as outfile:
   fieldnames = ['b', 'street_numb', 'street', 'address', 'zipcode']
   readablefile = csv.DictReader(infile)
   writablefile = csv.DictWriter(outfile, fieldnames=fieldnames)
   for row in readablefile:
       add = re.match(r'\d+\s*-\s*\d+', row['address'])
       if add:
            row['street_numb'] = add.group()
            # row['street'] = remove re.string (add.group()) from street field
            writablefile.writerow(row)
       else:
            writablefile.writerow(row)

What code in line 12 (# remove re.string from row['street']) could be used to resolve my issue (removing -78, -45, -35, -54 from the street field)?

...do you have a question?

jonrsharpe
– jonrsharpe

2016-03-08 14:48:52 +00:00
Commented Mar 8, 2016 at 14:48 — jonrsharpe
– jonrsharpe, Commented Mar 8, 2016 at 14:48

rock321987 · Accepted Answer · 2016-03-08 16:27:19Z

1

You can use capturing group with findall like this

[x for x in re.findall("(\d+\s*(-\s*\d+\s+)?)((\w|\s)+)", row['address'])][0][0]-->gives street number 
[x for x in re.findall("(\d+\s*(-\s*\d+\s+)?)((\w|\s)+)", row['address'])][0][2]-->gives address

edited Mar 8, 2016 at 16:27

answered Mar 8, 2016 at 15:02

rock321987

11.1k1 gold badge34 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

user3062459 Over a year ago

That would not work in line 12. It would have to be something that looked like this: row['street'] = code. That way the results would be written to the file in the line below (line13).

rock321987 Over a year ago

isnt row[street] = [x for x in re.findall("(-\d+\s+)?((\w|\s)+)", "FIFTH ST")][0][1] working

user3062459 Over a year ago

Part of that line worked. But it is replacing all the street field with the name "Fifth St". I want to bring over the value from the address field (with the regex removed).

user3062459 Over a year ago

Should not be this: 6 , 25 -40, FIFTH ST, 25 -40 14TH STREET, 11102. It should be 6 , 25 -40, 14TH STREET, 25 -40 14TH STREET, 11102

rock321987 Over a year ago

can you give an example from the data given by you in your question

|

Collectives™ on Stack Overflow

Remove regex pattern from string and store in csv

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related