2

I have a HTML code stored in a String variable in a Java code, in this string I have something like this:

<span style="text-decoration: underline;">test</span>

And I want something like this

<u>test</u>

Or if I have this:

<span style="color: #2873ee; text-decoration: underline;">test</span>

I want this:

<font color="#2873ee"><u>test</u></font>

Using regex I can do this:

affectedString.replaceAll("<span style=\"text-decoration: underline;\">(.*?)<\\/span>", "<u>$1</u>");

and

affectedString.replaceAll("<span style=\"color:\\s*?(#[A-Fa-f0-9]{6}|[A-Fa-f0-9]{3}); text-decoration: underline;\">(.*?)<\\/span>", "<u><font color=\"$1\">$2</u></font>");

Easy right? But I have a few issues and also I don't like this code. First, why I don't like this? because I need to use this in css style of: Underline, Color and Line-through, and write every case of coincidence is not a good code, for example:

affectedString.replaceAll("<span style=\"color:\\s*?(#[A-Fa-f0-9]{6}|[A-Fa-f0-9]{3}); text-decoration: underline;\">(.*?)<\\/span>", "<u><font color=\"$1\">$2</u></font>");
affectedString.replaceAll("<span style=\"text-decoration: underline; color:\\s*?(#[A-Fa-f0-9]{6}|[A-Fa-f0-9]{3});\">(.*?)<\\/span>", "<u><font color=\"$2\">$1</u></font>");

And the issues is this don't work with a code like:

<span style="text-decoration: underline;">test <span style="text-decoration: line-through;">two</span></span>

In that case when I try to apply the regex, the portion matches until the first </span>, so the final result is:

<u>test<span style="text-decoration:line-through;">two</u></span>

This is when I match the text-decoration:underline, then when I match the text-decoration:line-through. The second result is:

<u>test <strike>two</u></strike>

When the expected result is

<u>test <strike>two</strike></u>

My questions are, what regex I can use to solve this type of issues? And there is a better solution to "transform" that simple css into the html tags?

Thanks

2
  • You should not use Regex to parse the XML/HTML. Commented Mar 3, 2016 at 12:45
  • Any regex you or some other answerer offers for such task will be complicated, bug-prone and ugly looking. You definetly need some kind of html analyzer/parser for this Commented Mar 3, 2016 at 12:46

1 Answer 1

1

I'd advise against using regex. It isn't easy to debug/extend and gets nasty pretty fast. You could use a library like jsoup to parse the HTML, traverse the DOM and use CSS selectors to get Elements. E.g. to get all divs with a class attribute you'd use

Elements divs = doc.select("div[class]");

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.