3

I've got a pandas df column called 'Raw' for which the format is inconsistent. The strings it contains look like that:

'(1T XXX, Europe)'
'(2T YYYY, Latin America)'
'(3T ZZ/ZZZZ, Europe)'
'(4T XXX XXX, Africa)'

The only thing consistent in the strings in 'Raw' is that they start with a digit, includes a comma in the middle followed by a whitespace, and they contain parentheses as well.

Now, I'd like to create two extra columns (Model and Region) in my dataframe:

  • 'Model' would contain the beginning of the string, i.e. everything between the first parenthesis and the comma
  • 'Region' would contain the end of the string, i.e. everything between the whitespace after the comma and the final parenthesis

How do I do that using regex?

7 Answers 7

5

Since there's only one comma, and everything is between parentheses, in your case, use .str.split() instead, after slicing appropriately:

model_region = df.Raw.str[1:-1].str.split(', ', expand = True)

But if you insist:

model_region = df.Raw.str.extract('\((.*), (.*)\)', expand = True)

Then

df['Model'] = model_region[0]
df['Region'] = model_region[1]
Sign up to request clarification or add additional context in comments.

Comments

1

Try this : \(([^,]*), ([^)]*)\)

See : https://regex101.com/r/fCetWg/1

Comments

0
import re

s = '(3T ZZ/ZZZZ, Europe)'
m=re.search(r'\((.*), (.*)\)',s)
print(m.groups())

Comments

0
Model=re.findall(r"(?<=\().+(?=\,)",s)
Region=re.findall(r"(?<=\, ).+(?=\))",s)

The first regex checks for opening bracket "(" in front of the model and closing ",". The second regex checks for any string between "," and ")".

Comments

0
string_list = ['(1T XXX, Europe)',
'(2T YYYY, Latin America)',
'(3T ZZ/ZZZZ, Europe)',
'(4T XXX XXX, Africa)']
df = pd.DataFrame(string_list)
df = df[0].str.extract("\(([^,]*), ([^)]*)\)", expand=False)

Comments

0

If the comma is a reliable separator of your string parts, then you do not need regexp. If df is your dataframe:

df['Model'] = [x.split(',')[0].replace('(', '') for x in df['Raw']]
df['Region'] = [x.split(',')[1].replace(')', '') for x in df['Raw']]

if you want to use regexp is would look something like:

s = '(1T XXX, Europe)'
m = re.match('\(([\w\s]+),([\w\s]+)\)', s)
model = m.group(1)
region = m.group(2)

Comments

0

Simply you can try below:

Sample DataFrame:

df
                        raw
0          (1T XXX, Europe)
1  (2T YYYY, Latin America)
2      (3T ZZ/ZZZZ, Europe)
3      (4T XXX XXX, Africa)

Solution 1:

using str.extract with regex.

df = df.raw.str.extract(r'\((.*), (.*)\)').rename(columns={0:'Model', 1:'Region'})
print(df)
        Model         Region
0      1T XXX         Europe
1     2T YYYY  Latin America
2  3T ZZ/ZZZZ         Europe
3  4T XXX XXX         Africa

Solution 2:

str.replace() + str.split() with rename.

df = df.raw.str.replace('[(|)]' , '').str.split(',', expand=True).rename(columns={0:'Model', 1:'Region'})
print(df)
        Model          Region
0      1T XXX          Europe
1     2T YYYY   Latin America
2  3T ZZ/ZZZZ          Europe
3  4T XXX XXX          Africa

Note:

However, if you want to retain the original Column as well then, you can opt the below method:

df[['Model', 'Region' ]] = df.raw.str.replace('[(|)]' , '').str.split(',', expand=True)

print(df)
                        raw       Model          Region
0          (1T XXX, Europe)      1T XXX          Europe
1  (2T YYYY, Latin America)     2T YYYY   Latin America
2      (3T ZZ/ZZZZ, Europe)  3T ZZ/ZZZZ          Europe
3      (4T XXX XXX, Africa)  4T XXX XXX          Africa

OR

df[['Model', 'Region' ]] = df.raw.str.extract(r'\((.*), (.*)\)')
print(df)
                        raw       Model         Region
0          (1T XXX, Europe)      1T XXX         Europe
1  (2T YYYY, Latin America)     2T YYYY  Latin America
2      (3T ZZ/ZZZZ, Europe)  3T ZZ/ZZZZ         Europe
3      (4T XXX XXX, Africa)  4T XXX XXX         Africa

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.