regular expression to unlink html code with javascript

Question

I'm sorry,I can't believe this question is not solved in stackoverflow but I've been searching a lot and I don't find any solution.

I want to change HTML code with regular expressions in this way:

testing <a href="url">anchor</a>

to

testing anchor

Only I want to unlink a text code without use DOM functions, the code is in a string not in the document and I don't want to remove other tags that the a ones.

Also, just because the HTML is not in the DOM doesn't mean you couldn't parse it. — JJJ
– JJJ, Commented May 24, 2013 at 11:18
The html code is in a string it's not visible in the window, I just want to parse with a regular expression. — Oscardrbcn
– Oscardrbcn, Commented May 24, 2013 at 11:25
Right. Having the HTML in a string doesn't prevent you from using DOM methods. — JJJ
– JJJ, Commented May 24, 2013 at 12:20

Denys Séguret · Accepted Answer · 2013-05-24 11:41:26Z

5

If you really don't want to use DOM functions (why ?) you might do

str = str.replace(/<[^>]*>/g, '')

You can use it if you're fairly confident you don't have a more complex HTML but it will fail in many cases, for example some nested tags, or > in an attribute. You might fix some of the problems with more complex regular expressions but they aren't the right tool for this job in the general case.

If you don't want to remove other tags than a, do this :

str = str.replace(/<\/?a( [^>]*)?>/g, '')

This changes

<a>testing</a> <a href="url"><b>a</b>nchor</a><div>test</div><aaa>E</aaa>

to

testing <b>a</b>nchor<div>test</div><aaa>E</aaa>

edited May 24, 2013 at 11:41

answered May 24, 2013 at 11:18

Denys Séguret

384k90 gold badges813 silver badges780 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Benjamin Gruenbaum Over a year ago

+1 Works beautifully for OP's simple use case, I think this is the simplest regex solution. OP, if you're doing anything more complicated avoid this.

Oscardrbcn Over a year ago

Thank you very much, is all that I need, definitely I have to study some regular expressions tutorial, I don't know anything about it. It's enough although fails with nested tags. I can't use DOM functions (I suppose) because the code is in a string it's not showed in the document object.

Denys Séguret Over a year ago

@user1901219 Is this regex clear or do you want me to explain it ?

Oscardrbcn Over a year ago

Now I think, it doesn't work because I only want to remove the link tags, if I have <div><a href="url">anchor</a></div> I want the result <div>anchor</div>

andy magoon Over a year ago

You can create a DOM object from the string, use DOM methods to parse, without having had appended said DOM object to the document.

|

Community · Accepted Answer · 2020-06-20 09:12:55Z

4

I know you only want regex, for future viewers, here is a trivial solution using DOM methods.

var a = document.createElement("div");
a.innerHTML = 'testing <a href="url">anchor</a>';
var wordsOnly = a.textContent || a.innerText;

This will not fail on complicated use cases, allows nested tags and it's perfectly clear what's happening:

Hey browser! Create an element
Put that HTML in it
Give me back just the text, that's what I want now.

NOTE:

The element we're creating will not be added to the actual DOM since we're not adding it anywhere, it'll stay invisible. Here is a fiddle to illustrate how this works.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered May 24, 2013 at 11:22

Benjamin Gruenbaum

278k90 gold badges524 silver badges524 bronze badges

6 Comments

Benjamin Gruenbaum Over a year ago

Note to future readers, this is also possible if you're nodejs or another javascript framework. No need to reinvent wheels most of the time.

Denys Séguret Over a year ago

+1 because even while it wasn't what OP asks, it's generally a better solution. Shouldn't that be a little more complex for compatibility with IE8, like a.textContent||a.innerText ?

anon Over a year ago

What if I want to keep bold things, but just remove links? i.e. turn foo <b>bar <a href="google.com">baz</a> blep</b> awoo into foo <b>bar baz blep</b> awoo? This gets rid of all HTML in it, giving back foo bar baz blep awoo. I wouldn't call that a complicated use case, and I wouldn't say that it "allows nested tags".

Benjamin Gruenbaum Over a year ago

@QPaysTaxes still easy, you would just do a querySelectorAll('a') on it and then call .remove on the elements.

anon Over a year ago

@BenjaminGruenbaum That... also doesn't work, as far as I can tell. (I assume you mean something like [].forEach(a.querySelectorAll('a'), function(l) { a.remove(l); }; if that's incorrect, please clarify.)

|

HBP · Accepted Answer · 2013-05-24 11:39:38Z

0

As has been mentioned, you cannot parse HTML with regular expressions. The principal reason is that HTML elements nest and regular expressions cannot handle that.

That said, with a few restrictions which I will mention, you can do the following :

string.replace (/(\b\w+\s*)<a\s+href="([^"]*)">(.*)<\/a>/g, '$1 $3')

This requires there to be a word before the tag, spacing between the word and the tag is optional, no attributes other than the href specified in the <a> tag and you accept anything between the <a> and the .

edited May 24, 2013 at 11:39

answered May 24, 2013 at 11:29

HBP

16.1k6 gold badges30 silver badges34 bronze badges

2 Comments

Benjamin Gruenbaum Over a year ago

This gives me "testing url" and not "testing anchor" like OP asked for

Oscardrbcn Over a year ago

It didn't work for my simple code, I don't know if I understood good the "This requires there to be a word before the tag", I've tried with a word before. But anyway the expression of @dystroy is enough for me. Thank you!

andy magoon · Accepted Answer · 2013-05-24 12:07:33Z

0

You can create a DOM object from the string, use DOM methods to parse, without having had appended said DOM object to the document

answered May 24, 2013 at 12:07

andy magoon

2,9192 gold badges21 silver badges14 bronze badges

2 Comments

Benjamin Gruenbaum Over a year ago

Hey andy, did you mean to post it as a comment and not an answer perhaps?

Oscardrbcn Over a year ago

Yes it's true, but I though it was quicker and elegant to do it with regular expressions, but now I see the Mat answer link and maybe I was wrong.

Collectives™ on Stack Overflow

regular expression to unlink html code with javascript

4 Answers 4

8 Comments

NOTE:

6 Comments

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

8 Comments

NOTE:

6 Comments

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related