0

I have a dataframe like this:

name      link
apple    example1.com/dsa/es?id=2812168&width=1200/web/map&resize.html
banana.  example2.com/es?id=28132908&width=1220/web/map_resize.html
orange.  example3.com/es?id=3209908&width=1120/web&map_resize.html

Each name's ID is buried in the link, which may have different structure. However, I know that the pattern is 'id=' + 'what I want' + '&'

I wonder, is there a way to extract the id from link and put it back to the dataframe to get the following:

name      link
apple    2812168
banana.  28132908
orange.  3209908

I try to use this:

df['name'] = df['name'].str.extract(r'id=\s*([^\.]*)\s*\\&', expand=False)

but it returns a column with all nan

Also, there may be more than one & in the link

3 Answers 3

2

I think Ids are always numbers, so this is somewhat cleaner:

df["link"] = df['link'].str.extract(r'id=(\d+)&', expand=False)
print(df)
#     name      link
#0   apple   2812168
#1  banana  28132908
#2  orange   3209908
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! Yes ID is always a number. This works perfectly in this context!
Glad I could help :)
2

Let tri split

df['link'].str.split('id=').str[1].str.split('&').str[0]
0     2812168
1    28132908
2     3209908
Name: link, dtype: object

1 Comment

@Tian yw :-) happy coding ~
2

We can make use of positive lookbehind and positive lookahead:

df['link'] = df['link'].str.extract('(?<=id\=)(.*?)(?=\&)')


      name      link
0    apple   2812168
1  banana.  28132908
2  orange.   3209908

Details:

  • (?<=id\=): positive lookbehind on id=
  • (.*): everything
  • (?=\&width): positive lookahead on &width

3 Comments

Thank you! I should have mentioned that it is not always &width after the id. It doesn't seem to work if I use df['link'] = df['link'].str.extract('(?<=id\=)(.*)(?=\&)') Is there a way to get around this?
Yes, by using a so called "non greedy" operator, notice the .*?, see edit. Be aware that the accepted solution will not work if the id at one point should contain alphanumeric values.
Thank you!! This makes a lot of sense!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.