2

I have the following string:

var originalStr = "Test example <firstTag>text inside first tag</firstTag>, <secondTag>50</secondTag> end."

What's the best way to identify all tags, the correspondent tag name and their content? This is the kind of result I'm looking for.


var tagsFound = 
    [ { "tagName": "firstTag",  "value": "text inside first tag" } 
    , { "tagName": "secondTag", "value": "50" } 
    ] 
2
  • 2
    There are dozens of questions covering this topic on the site, I suggest looking through them. But I've also included some information below (many of those questions are fairly specific and also quite old). Commented Jan 14, 2021 at 18:32
  • bun.sh/docs/bundler/macros#make-fetch-requests-at-bundle-time Thank you everyone guys, i found a built in that does just what i wanted! Commented Mar 16, 2024 at 18:31

2 Answers 2

2

HTML is very complicated to parse, so the best approach is to use a parser that already exists.

If you're doing this in a browser, you can use the one built into the browser: DOMParser.

If you're doing this in Node.js, there are several libraries to do it, such as jsdom. It provides an API almost identical to the one in web browsers.

Here's a jsdom example:

const dom = new JSDOM("<!doctype html>" + originalStr);
const doc = dom.window.document;
for (const childElement of doc.body.children) {
    console.log(`${childElement.tagName} - ${childElement.textContent}`);
}

With your string, that would output:

FIRSTTAG - text inside first tag
SECONDTAG - 50

You'd write code using the DOM methods provided to create the output you're looking for. (Note the tag name normalization above; you may have to use nodeLocation to get the original capitalization if it matters to what you're doing.)

Sign up to request clarification or add additional context in comments.

Comments

0

Depending on complexity of strings you dealing with - the simple regEx solution might work (it works for your string nicely:

var str = 'Test example <firstTag>text inside first tag</firstTag>, <secondTag>50</secondTag> end.';

var tagsFound = [];
str.replace(/<([a-zA-Z][a-zA-Z0-9_-]*)\b[^>]*>(.*?)<\/\1>/g, function(m,m1,m2){
    // write data to result objcect
    tagsFound.push({
        "tagName": m1,
        "value": m2
    })
    // replace with original = do nothing with string
    return m;
});

// Displaying the results
for(var i=0;i<tagsFound.length; i++){
    console.log(tagsFound[i]);
}

There will be a problem when self closing tags or tags containing other tags are taken into accont. Like <selfClosedTag/> or <tag><tag>something</tag>else</tag>

1 Comment

Can you explain how did you get the m1 and m2 values with the regex?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.