0

I have a string say like this:

  ARAN22 SKY BYT and TRO_PAN

In the above string The first alphabet can be A or S or T or N and the two numbers after RAN can be any two digit. However the rest will be always same and last three characters will be always like _PAN.

So the few possibilities of the string are :

  SRAN22 SK BYT and TRO_PAN
  TRAN25 SK BYT and TRO_PAN
  NRAN25 SK BYT and TRO_PAN

So I was trying to extract the string every time in python using regex as follows:

import re

pattern =   "([ASTN])RAN" + "\w+\s+" +"_PAN"

pat_check = re.compile(pattern, flags=re.IGNORECASE)

sample_test_string = 'NRAN28 SK BYT and TRO_PAN'

re.match(pat_check, sample_test_string) 

here string can be anything like the above examples I gave there.

But its not working as I am not getting the string name ( the sample test string) which I should. Not sure what I am doing wrong. Any help will be very much appreciated.

6
  • Where does your regex attempt to handle the digits? Commented Oct 12, 2022 at 0:56
  • What does "I am not getting the string name " mean? Commented Oct 12, 2022 at 1:00
  • @Scott I was expecting when I run re.match(pat_check, sample_test_string) , I should be getting the sample-test_string: 'NRAN28 SK BYT and TRO_PAN'. But instead when I run the code, I get None Commented Oct 12, 2022 at 1:02
  • pattern = "([ASTN])RAN" + "\w+\s+" +"_PAN" To pick just one example of how the pattern is wrong, it demands that _PAN is immediately preceded by a space, which your sample string clearly does not have. Commented Oct 12, 2022 at 1:05
  • Thanks @JohnGordon I see. Yeah I was trying "([ASTN])RAN" +"\w+" + "\s+" +"TRO_PAN", too, but it fails too as expected and mentioned by you. So then I have to put all the string with spaces for the regex to work? Commented Oct 12, 2022 at 1:08

1 Answer 1

1

You are using \w+\s+, which will match one or more word (0-9A-Za-z_) characters, followed by one or more space characters. So it will match the two digits and space after RAN but then nothing more. Since the next characters are not _PAN, the match will fail. You need to use [\w\s]+ instead:

pattern =   "([ASTN])RAN" + "[\w\s]+" +"_PAN"
Sign up to request clarification or add additional context in comments.

4 Comments

oh wonderful. Thats sweet. Thank you @Nick! I learned something new!
Since you seem to be expert in regex, one more quick help: if my string is like: 'NRAN'28 SK BYT and TRO_PAN' with an apostrophe in between RAN and the number how can I extract that? I was trying "([ASTN])" + + "\'" + "RAN" + "[\w\s]+" +"_PAN". But that does not seem to work. Any ideas ?
You have the apostrophe in the wrong place in your pattern, it should be after RAN i.e. pattern = "([ASTN])" + "RAN" + "'" + "[\w\s]+" +"_PAN" Note you don't need a backslash before the '.
oh Nick you are awesome! Thanks for pointing my error. I was continuosly scratching my head!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.