Java - regular expression search a string

Question

I am reading a string from a file that reads like

<div style="Z-INDEX: 654; BORDER-BOTTOM: 0px; POSITION: absolute; BORDER-LEFT: 0px; WIDTH: 80px; HEIGHT: 22px; BORDER-TOP: 0px; TOP: 64px; CURSOR: auto; BORDER-RIGHT: 0px; LEFT: 240px" id="textboxElt11286249556014dIi15v" lineid="lineid" pos_rel="false" x1="240" x2="320" y1="64" y2="86"><input style="WIDTH: 80px; HEIGHT: 20px" id="textboxElt11286249556014dIi15v_textbox" title="Enter Registration Number Here" tabindex="1" value=" " maxlength="15" size="10" name="scheduled_tribe_registration_number_text"></input></div>

there will be multiple lines of this sort and data is not fixed i want to fetch the value of style i want to do it with regular expressions as the child elements too can have style attributes in them and i want to fetch all style attributes

I think you will get tons of "don't use regex for HTML parsing" comments. Is there any special reasons that you can't use a HTML parse for this? — Hoàng Long
– Hoàng Long, Commented Dec 20, 2010 at 7:01
i want all occurence of style attributes.. was unable to do with saxParser — Varun
– Varun, Commented Dec 20, 2010 at 7:11

user452425 · Accepted Answer · 2010-12-20 08:53:32Z

2

There are many good html parser libraries for Java, HTMLCleaner is one of them.

Here is a better way to get style attribute:

import org.htmlcleaner.HtmlCleaner;
import org.htmlcleaner.TagNode;

public class Test {

    public static void main(String[] args) throws Throwable {
        HtmlCleaner cleaner = new HtmlCleaner();
        String html = "<div style=\"Z-INDEX: 654; BORDER-BOTTOM: 0px; POSITION: absolute; BORDER-LEFT: 0px; WIDTH: 80px; HEIGHT: 22px; BORDER-TOP: 0px; TOP: 64px; CURSOR: auto; BORDER-RIGHT: 0px; LEFT: 240px\" id=\"textboxElt11286249556014dIi15v\" lineid=\"lineid\" pos_rel=\"false\" x1=\"240\" x2=\"320\" y1=\"64\" y2=\"86\"><input style=\"WIDTH: 80px; HEIGHT: 20px\" id=\"textboxElt11286249556014dIi15v_textbox\" title=\"Enter Registration Number Here\" tabindex=\"1\" value=\" \" maxlength=\"15\" size=\"10\" name=\"scheduled_tribe_registration_number_text\"></input></div>";
        TagNode node = cleaner.clean(html);
        TagNode div = node.findElementByName("div", true);
        System.out.println(div.getAttributeByName("style"));
    }
}

If you are familiar with jquery, you should also check the jsoup.

answered Dec 20, 2010 at 8:53

user452425

Sign up to request clarification or add additional context in comments.

1 Comment

Axel Over a year ago

Don't be misguided by the name (htmlcleaner). it does what you want ad it's easy to use.

Andreas Dolk · Accepted Answer · 2010-12-20 07:23:06Z

0

Don't use regex to parse html. This one uses a regular expression too:

String line = getNextLineFromInput();
String[] parts = line.split("\"");
String style = "";
for (int i = 0; i < parts.length; i++) {
  if (parts[i].endsWith("style=") {
    style = parts[i+1];
    break;
  }
}

Note: this will fail for all real world html files, but you mentioned some input with lines just like your example line; this is a very specialised solution for exactly this type of input.

answered Dec 20, 2010 at 7:23

Andreas Dolk

115k20 gold badges185 silver badges275 bronze badges

Comments

fastcodejava · Accepted Answer · 2010-12-20 07:27:26Z

0

Don't use regex to parse html. That being said, you can use something like :

<div \s*style="([A-Z0-9-;: ]*)"\s*>

edited Dec 20, 2010 at 7:27

answered Dec 20, 2010 at 7:13

fastcodejava

41.3k31 gold badges142 silver badges191 bronze badges

2 Comments

Andreas Dolk Over a year ago

... assuming, style is always the first attribute of a div element, but like you mentioned: Don't use regex to parse html.

dheerosaur Over a year ago

I am not familiar with java regex. Won't this be greedy and consume till the last closing double quote?

Collectives™ on Stack Overflow

Java - regular expression search a string

3 Answers 3

1 Comment

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related