1

I am reading a string from a file that reads like

<div style="Z-INDEX: 654; BORDER-BOTTOM: 0px; POSITION: absolute; BORDER-LEFT: 0px; WIDTH: 80px; HEIGHT: 22px; BORDER-TOP: 0px; TOP: 64px; CURSOR: auto; BORDER-RIGHT: 0px; LEFT: 240px" id="textboxElt11286249556014dIi15v" lineid="lineid" pos_rel="false" x1="240" x2="320" y1="64" y2="86"><input style="WIDTH: 80px; HEIGHT: 20px" id="textboxElt11286249556014dIi15v_textbox" title="Enter Registration Number Here" tabindex="1" value=" " maxlength="15" size="10" name="scheduled_tribe_registration_number_text"></input></div>

there will be multiple lines of this sort and data is not fixed i want to fetch the value of style i want to do it with regular expressions as the child elements too can have style attributes in them and i want to fetch all style attributes

2
  • 4
    I think you will get tons of "don't use regex for HTML parsing" comments. Is there any special reasons that you can't use a HTML parse for this? Commented Dec 20, 2010 at 7:01
  • i want all occurence of style attributes.. was unable to do with saxParser Commented Dec 20, 2010 at 7:11

3 Answers 3

2

There are many good html parser libraries for Java, HTMLCleaner is one of them.

Here is a better way to get style attribute:

import org.htmlcleaner.HtmlCleaner;
import org.htmlcleaner.TagNode;

public class Test {

    public static void main(String[] args) throws Throwable {
        HtmlCleaner cleaner = new HtmlCleaner();
        String html = "<div style=\"Z-INDEX: 654; BORDER-BOTTOM: 0px; POSITION: absolute; BORDER-LEFT: 0px; WIDTH: 80px; HEIGHT: 22px; BORDER-TOP: 0px; TOP: 64px; CURSOR: auto; BORDER-RIGHT: 0px; LEFT: 240px\" id=\"textboxElt11286249556014dIi15v\" lineid=\"lineid\" pos_rel=\"false\" x1=\"240\" x2=\"320\" y1=\"64\" y2=\"86\"><input style=\"WIDTH: 80px; HEIGHT: 20px\" id=\"textboxElt11286249556014dIi15v_textbox\" title=\"Enter Registration Number Here\" tabindex=\"1\" value=\" \" maxlength=\"15\" size=\"10\" name=\"scheduled_tribe_registration_number_text\"></input></div>";
        TagNode node = cleaner.clean(html);
        TagNode div = node.findElementByName("div", true);
        System.out.println(div.getAttributeByName("style"));
    }
}

If you are familiar with jquery, you should also check the jsoup.

Sign up to request clarification or add additional context in comments.

1 Comment

Don't be misguided by the name (htmlcleaner). it does what you want ad it's easy to use.
0

Don't use regex to parse html. This one uses a regular expression too:

String line = getNextLineFromInput();
String[] parts = line.split("\"");
String style = "";
for (int i = 0; i < parts.length; i++) {
  if (parts[i].endsWith("style=") {
    style = parts[i+1];
    break;
  }
}

Note: this will fail for all real world html files, but you mentioned some input with lines just like your example line; this is a very specialised solution for exactly this type of input.

Comments

0

Don't use regex to parse html. That being said, you can use something like :

<div \s*style="([A-Z0-9-;: ]*)"\s*>

2 Comments

... assuming, style is always the first attribute of a div element, but like you mentioned: Don't use regex to parse html.
I am not familiar with java regex. Won't this be greedy and consume till the last closing double quote?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.