0

I have a column containing strings that are comprised of different words but always have a similar structure structure. E.g.:

2cm off ORDER AGAIN (191 1141)

I want to extract the sub-string that starts after the second space and ends at the space before the opening bracket/parenthesis. So in this example I want to extract ORDER AGAIN.

Is this possible?

1
  • r"2cm off ORDER AGAIN (191 1141)".split(r"(")[0].split(" ", maxsplit=2)[-1] Commented May 21, 2021 at 10:49

4 Answers 4

1

You could use str.extract here:

df["out"] = df["col"].str.extract(r'^\w+ \w+ (.*?)(?: \(|$)')

Note that this answer is robust even if the string doesn't have a (...) term at the end.

Here is a demo showing that the regex logic is working.

Sign up to request clarification or add additional context in comments.

Comments

1

You can try the following:

r"2cm off ORDER AGAIN (191 1141)".split(r"(")[0].split(" ", maxsplit=2)[-1].strip()
#Out[3]: 'ORDER AGAIN'

Comments

0

If the pattern of data is similar to what you have posted then I think the below code snippet should work for you:

import re
data = "2cm off ORDER AGAIN (191 1141)"

extr = re.match(r".*?\s.*?\s(.*)\s\(.*", data)       
if extr:
    print (extr.group(1))

Comments

0

You can try the following code

s = '2cm off ORDER AGAIN (191 1141)'
second_space = s.find(' ', s.find(' ') + 1)
openparenthesis = s.find('(')
substring = s[second_space : openparenthesis]
print(substring) #ORDER AGAIN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.