1

The variable htmlStr may contain different spellings of id:

var htmlStr = "<div id="demo_div"></div>";

var htmlStr = "<div id='demo_div'></div>";

var htmlStr = "<div id=demo_div class="demo"></div>";

var htmlStr = "<div id=demo_div></div>";

How can I write this differently without many try-catch functions? Can I combine the patterns? It works - but does not look pretty.

var idname;
try {
    idname = /(id="(.*?)(\"))/g.exec(htmlStr)[2]
} catch (e) {
    try {
        idname = /(id='(.*?)(\'))/g.exec(htmlStr)[2]
    } catch (e) {
        try {
            idname = /(id=(.*?)(\ ))/g.exec(htmlStr)[2]
        } catch (e) {
            try {
                idname = /(id=(.*?)(\>))/g.exec(htmlStr)[2]
            } catch (e) {
                console.log(e);
            }
        }
    }
}

console.log(idname);
4
  • 1
    exec doesn't throw an error if no match is found Commented Sep 26, 2018 at 10:30
  • Try (id=['"]?(.*?)["'> ]) Commented Sep 26, 2018 at 10:33
  • 1
    You need something like /id=(?:(["'])([^'"]*)\1|([^\s>]*))/g, loop over all matches by calling exec until no match, and only grab either Group 2 or Group 3 (if Group 3 matched). But it is safer to use a DOM parser to parse HTML. Commented Sep 26, 2018 at 10:40
  • 1
    ...how about NOT using regex. Take the HTML string, make it into an actual element, check its ID. stackoverflow.com/questions/2522422/… Commented Sep 26, 2018 at 10:42

2 Answers 2

1

You can do this without using regex by simply parsing the HTML.

const htmlStrings = [
  '<div id="demo_div"></div>',
  "<div id='demo_div'></div>",
  "<div id=demo_div class='demo'></div>",
  '<div data-id="not_a_real_id"></div>', //note: doesn't have an ID 
  "<div data-id=not_an_id ID= demo_div></div>", 
  "<div id= demo_div><span id=inner_id></span></div>"
];

function getId(html) {
  const parser = document.createElement('div');
  parser.innerHTML = html;
  
  return parser.firstChild.id;
}

htmlStrings.forEach(x => console.log(getId(x)));

As you can see, you can create an element, put the HTML in it, then grab the first child and check it's ID. It works even if you have another type of attribute like a custom attribute called data-id or if the ID has any kind of capitalisation or even if that div has inner elements or anything else.

This technique won't work with invalid HTML or if you have multiple elements you want the ID of but this is simply to demonstrate it. Once it's parsed into a proper element, you can traverse its hierarchy as you see fit and perform any sort of extraction you need.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, that's the safest option for me and fits, because I need a valid HTML anyway.
0
/id=["']?([^\s"'>]+)/g

This will match all four examples.

enter image description here

enter image description here

1 Comment

With /\sid=["']?([^\s"'>]+)/g also the other cases work. Use the form anyway with a try-catch, because it can also happen that there is no ID and I want to catch this with an error.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.