1

I'm sorry,I can't believe this question is not solved in stackoverflow but I've been searching a lot and I don't find any solution.

I want to change HTML code with regular expressions in this way:

testing <a href="url">anchor</a>

to

testing anchor

Only I want to unlink a text code without use DOM functions, the code is in a string not in the document and I don't want to remove other tags that the a ones.

4
  • 2
    There's a reason why it's not solved... Commented May 24, 2013 at 11:17
  • Also, just because the HTML is not in the DOM doesn't mean you couldn't parse it. Commented May 24, 2013 at 11:18
  • The html code is in a string it's not visible in the window, I just want to parse with a regular expression. Commented May 24, 2013 at 11:25
  • Right. Having the HTML in a string doesn't prevent you from using DOM methods. Commented May 24, 2013 at 12:20

4 Answers 4

5

If you really don't want to use DOM functions (why ?) you might do

str = str.replace(/<[^>]*>/g, '')

You can use it if you're fairly confident you don't have a more complex HTML but it will fail in many cases, for example some nested tags, or > in an attribute. You might fix some of the problems with more complex regular expressions but they aren't the right tool for this job in the general case.

If you don't want to remove other tags than a, do this :

str = str.replace(/<\/?a( [^>]*)?>/g, '')

This changes

<a>testing</a> <a href="url"><b>a</b>nchor</a><div>test</div><aaa>E</aaa>

to

testing <b>a</b>nchor<div>test</div><aaa>E</aaa>
Sign up to request clarification or add additional context in comments.

8 Comments

+1 Works beautifully for OP's simple use case, I think this is the simplest regex solution. OP, if you're doing anything more complicated avoid this.
Thank you very much, is all that I need, definitely I have to study some regular expressions tutorial, I don't know anything about it. It's enough although fails with nested tags. I can't use DOM functions (I suppose) because the code is in a string it's not showed in the document object.
@user1901219 Is this regex clear or do you want me to explain it ?
Now I think, it doesn't work because I only want to remove the link tags, if I have <div><a href="url">anchor</a></div> I want the result <div>anchor</div>
You can create a DOM object from the string, use DOM methods to parse, without having had appended said DOM object to the document.
|
4

I know you only want regex, for future viewers, here is a trivial solution using DOM methods.

var a = document.createElement("div");
a.innerHTML = 'testing <a href="url">anchor</a>';
var wordsOnly = a.textContent || a.innerText; 

This will not fail on complicated use cases, allows nested tags and it's perfectly clear what's happening:

  • Hey browser! Create an element
  • Put that HTML in it
  • Give me back just the text, that's what I want now.

NOTE:

The element we're creating will not be added to the actual DOM since we're not adding it anywhere, it'll stay invisible. Here is a fiddle to illustrate how this works.

6 Comments

Note to future readers, this is also possible if you're nodejs or another javascript framework. No need to reinvent wheels most of the time.
+1 because even while it wasn't what OP asks, it's generally a better solution. Shouldn't that be a little more complex for compatibility with IE8, like a.textContent||a.innerText ?
What if I want to keep bold things, but just remove links? i.e. turn foo <b>bar <a href="google.com">baz</a> blep</b> awoo into foo <b>bar baz blep</b> awoo? This gets rid of all HTML in it, giving back foo bar baz blep awoo. I wouldn't call that a complicated use case, and I wouldn't say that it "allows nested tags".
@QPaysTaxes still easy, you would just do a querySelectorAll('a') on it and then call .remove on the elements.
@BenjaminGruenbaum That... also doesn't work, as far as I can tell. (I assume you mean something like [].forEach(a.querySelectorAll('a'), function(l) { a.remove(l); }; if that's incorrect, please clarify.)
|
0

As has been mentioned, you cannot parse HTML with regular expressions. The principal reason is that HTML elements nest and regular expressions cannot handle that.

That said, with a few restrictions which I will mention, you can do the following :

string.replace (/(\b\w+\s*)<a\s+href="([^"]*)">(.*)<\/a>/g, '$1 $3')

This requires there to be a word before the tag, spacing between the word and the tag is optional, no attributes other than the href specified in the <a> tag and you accept anything between the <a> and the .

2 Comments

This gives me "testing url" and not "testing anchor" like OP asked for
It didn't work for my simple code, I don't know if I understood good the "This requires there to be a word before the tag", I've tried with a word before. But anyway the expression of @dystroy is enough for me. Thank you!
0

You can create a DOM object from the string, use DOM methods to parse, without having had appended said DOM object to the document

2 Comments

Hey andy, did you mean to post it as a comment and not an answer perhaps?
Yes it's true, but I though it was quicker and elegant to do it with regular expressions, but now I see the Mat answer link and maybe I was wrong.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.