5

Is there any library or pre-written code to remove css attributes from HTML code.

The requirement is, the Java code has to parse through the input html document, and remove the css attributes and produce the output html document.

For example if the input html document has this element,

      <p class="abc" style="xyz" > some text </p>

the output should be

      <p > some text </p>

2 Answers 2

11

Use jsoup and NodeTraversor to remove class and style attributes from all elements

Document doc = Jsoup.parse(input);


NodeTraversor traversor  = new NodeTraversor(new NodeVisitor() {

  @Override
  public void tail(Node node, int depth) {
    if (node instanceof Element) {
        Element e = (Element) node;
        e.removeAttr("class");
        e.removeAttr("style");
    }
  }

  @Override
  public void head(Node node, int depth) {        
  }
});

traversor.traverse(doc.body());
String modifiedHtml = doc.toString();
Sign up to request clarification or add additional context in comments.

Comments

0

You could use Cyberneko to parse the document and add a simple filter that looks something like this:

public class RemoveStyleFilter
    extends DefaultFilter
{
  @Override
  public void startElement(QName element, XMLAttributes attributes, Augmentations augs)
    throws XNIException
  {
    for (String forbidden : new String[] {"class", "style"})
    {
      int index = attributes.getIndex(forbidden);
      if (index >= 0)
      {
        attributes.removeAttributeAt(index);
      }
    }
    super.startElement(element, attributes, augs);
  }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.