2

Currently I have an editable div and I want to add very basic syntax highlighting. Essentially I want text between * to turn a different color and text in quotes to turn a different color. For example:

input: "hello" *world*

output: <span class='a'>"hello"</span> <span class='b'>*world*</span>

I'm using Rangy.js library to save and restore the caret position so there's no issues there. However I'm really struggling to turn the input into the output. The big problem I have is ignoring any " and * that are already highlighted.

If anyone could point me in the direction of a basic algorithm or regular expression or something it would be much appreciated.

2 Answers 2

2
function highlight(text) {
    var result = [];
    for (var i = 0; i < text.length; i++) {
        if (text[i] === '"') {
            var stop = text.indexOf('"', i + 1);
            result.push('<span class="a">');
            result.push(text.substring(i, stop+1));
            result.push('</span>');
            i = stop;
        }
        else if (text[i] === '*') {
            var stop = text.indexOf('*', i + 1);
            result.push('<span class="b">');
            result.push(text.substring(i, stop+1));
            result.push('</span>');
            i = stop;
        }
        else if (text[i] === '<') {
            // Skip simple HTML tags.
            var stop = text.indexOf('>', i + 1);
            result.push(text.substring(i, stop+1));
            i = stop;
        }
        else {
            result.push(text.substring(i,i+1));
        }
    }
    return result.join('');
}

Example:

>>> highlight('foo *bar"baz"qux* "foobar" qux')
"foo <span class="b">*bar"baz"qux*</span> <span class="a">"foobar"</span> qux"

Or with regular expressions:

function highlight2(text) {
    return text.replace(/([*"]).*?\1|<[^<>]*>/g, function (match, ch) {
        // 'match' contains the whole match
        // 'ch' contains the first capture-group
        if (ch === '"') {
            return '<span class="a">' + match + '</span>';
        }
        else if (ch === '*') {
            return '<span class="b">' + match + '</span>';
        }
        else {
            return match;
        }
    });
}

The regular expression ([*"]).*?\1 contains the following:

  • [*"] matches * or ". (They don't need to be escaped inside [ ]).
  • ( ) captures the matched string into capture-group 1.
  • .*? matches anything up until the first...
  • \1 matches the same string as was captured into capture-group 1.
  • | is "Or". It tries to match the left side, and if that fails, it tries to match the right side.
  • <[^<>]*> matches simple html-tags. It will not be able to handle attributes with literal < or > in them: <a href="info.php?tag=<i>"> (that is bad HTML anyway, but some browsers will accept it.)

In the case when it matches an HTML tag, the ch parameter will be undefined, and the else-branch will be picked.

If you want to add more characters, just put them inside the [ ], and add an if-statement to handle them. You can use any character except -, \ and ] without escaping them. To add those characters, you need to put another \ in front of them.

Sign up to request clarification or add additional context in comments.

2 Comments

This fails using this test string: highlight('look <span class=\"a\">"test"</span>') gives the following: "look <span class=<span class=\"a\">\"a</span>><span class=\"a\">\"test</span></span>"
It is not supposed to be applied multiple times. I could update it so it tries to ignore HTML-tags, if you wish.
0

Your basic algorithm is

function highlight(myInput) {
  // Split the string into tokens.
  // "[^"]*"    matches a minimal run surrounded by quotes
  // \*[^*]*\*  matches a minimal run surrounded by asterisks
  // ["*][^"*]* matches an unmatched quote or asterisk and the tail of the string
  // [^"*]+     matches a maximal un-styled run
  var tokens = myInput.match(/"[^"]*"|\*[^*]*\*|["*][^"*]*$|[^"*]+/g);

  // Walk over the list of tokens and turn them into styled HTML
  var htmlOut = [];
  for (var i = 0, n = tokens.length; i < n; ++i) {
    var token = tokens[i];
    // Choose a style.
    var className =
        token.charAt(0) == '"' ? "a" : token.charAt(0) == '*' ? "b" : null;
    // Surround in a span if we have a style.
    if (className) { htmlOut.push("<span class='", className, "'>"); }
    // HTML escape the token content.
    htmlOut.push(token.replace(/&/g, "&amp;").replace(/</g, "&lt;"));
    if (className) { htmlOut.push("</span>"); }
  }
  // Join the output tokens.
  return htmlOut.join('');
}


alert(highlight('"hello" *world*'));

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.