1

I'm looking for a regular expression to find all instances of a CSS class name in HTML markup. So far I have this, assuming row is the class name that I'm looking for:

class=\"[a-zA-Z0-9\-_\s]*row[a-zA-Z0-9\-_\s]*\"

It correctly matches all of the following:

class="foo_bar bar row test"
class="row"
class="hello foo bar  row"
class=" foo bar  row test "

And correctly doesn't match this:

class="hello"  row

Unfortunately it incorrectly matches these (false positives):

class="narrow"
class="rowdy"

What regex will find a specific CSS class name in HTML?

Update There are lots of comments about how I shouldn't parse the DOM with regex. My use case is to do a 'find all' in a large project with thousands of HTML files to find where specific CSS classes are being used. I'm not operating inside of a browser or have access to a DOM.

10
  • 2
    Just to be sure: do you have to use regex as opposed to a DOM parser here? If you have to, I'd say adding \b (word boundary) before and after row should do it, though I didn't really think this through so might be better ways. Commented Mar 26, 2019 at 22:09
  • Try class="(?:row|[^"]* row)(?![^" ])[^"]*" if _row_ is not allowed too. See live demo here regex101.com/r/Xq4sT9/1 Commented Mar 26, 2019 at 22:14
  • Also what about "hello a-row"? Commented Mar 26, 2019 at 22:19
  • Oh yeah a word boundary isn't enough because of dashes (at least). Commented Mar 26, 2019 at 22:22
  • You forgot that class = " (notice the spaces) is also a legit syntax. And that a text class="row is also a legit text. Stop using regex to parse DOM. Use what browsers already use. A DOMParser. Tony the Pony he comes... Commented Mar 26, 2019 at 22:31

2 Answers 2

1

You have to make boundaries but \b isn't enough since it matches the position between - and r in a-row which is expected but not intended. To define this boundary to only allow spaces or the position right after or before " of class attribute, you will need to write a pattern with two branches:

class="(?:row|[^"]* row)(?![^" ])[^"]*"

The above could be shorten to (but not preferred):

class="(?:[^"]* )?row(?![^" ])[^"]*"

Shorter but the same as longer one (talking performance-wise):

class="(?:[^"]* )??row(?: [^"]*)?"

Regex breakdown:

  • class=" Match class=" literally
  • (?: Start of non-capturing group
    • row Match row
    • | Or
    • [^"]* row Match row preceded by a space character
  • ) End of capturing group
  • (?![^" ]) The next immediate character should be space or "
  • [^"]*" Match up to and including "

See live demo here

Sign up to request clarification or add additional context in comments.

2 Comments

This is great and has saved me lots of time cleaning up CSS/HTML in a large project.
Glad to hear. I just looked at the regex and realized it could be written shorter without affecting performance. So I added it.
1

Try the below regex

(class\s?=\s?)\"([\d\w\s-])(\brow\b)([\d\w\s])\"

Tested all the cases you mentioned

https://regex101.com

1 Comment

Thanks, that's pretty good. It fails this test though: class="flex-mt90 foo bar row" row, but I realize that I didn't have it in my list of examples. regex101.com/r/jeos4r/2

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.