How to escape XML entities in JavaScript?

Question

In JavaScript (server side NodeJS) I'm writing a program which generates XML as output.

I am building the XML by concatenating a string:

str += '<' + key + '>';
str += value;
str += '</' + key + '>';

The problem is: what if value contains characters like '&', '>' or '<'? What's the best way to escape those characters?

or is there any JavaScript library around which can escape XML entities?

The question may be also tagged NodeJS since it's partially related to also that — V. Bozz
– V. Bozz, Commented Jan 9, 2024 at 7:49

Klesun · Accepted Answer · 2023-11-22 21:22:00Z

146

This might be a bit more efficient with the same outcome:

function escapeXml(unsafe) {
    return unsafe.replace(/[<>&'"]/g, function (c) {
        switch (c) {
            case '<': return '&lt;';
            case '>': return '&gt;';
            case '&': return '&amp;';
            case '\'': return '&apos;';
            case '"': return '&quot;';
        }
    });
}

edited Nov 22, 2023 at 21:22

Klesun

14.1k7 gold badges66 silver badges60 bronze badges

answered Jan 16, 2015 at 8:32

hgoebl

13.1k9 gold badges51 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Sebastian Over a year ago

@VictorGrazi: your right, its in 49 of 50 tests the faster solution. Maybe its because its nearly 5 years younger than the accepted answer.

Victor Grazi Over a year ago

@Sebastian ahh, that would explain it, thanks. Look here folks ^ ^ ^ ^ this is the solution you want!!!

Jamie Birch Over a year ago

This strikes me as a better solution than the accepted answer, which traverses the whole string five times (serially, reducing the scope for JS engine optimisation) looking for a match against a single character; hgoebl's solution traverses the input string only once, trying to match each character to one of five conditions. The question is what is more costly: 1) traversing the string; or: 2) matching each character against 5 possible characters. My intuition is that 1) would be the more costly.

hgoebl Over a year ago

The problem with accepted answer: it creates ~5 copies of the string. When the string is long, it's a lot of work to allocate memory and later garbage collect the interim strings not really used anywhere. (Note: JavaScript strings are immutable.)

hgoebl Over a year ago

@RanLottem decoding is much more complicated if input is HTML, see Wikipedia. It's better to use a parser (XML or document).

|

zzzzBov · Accepted Answer · 2013-11-19 20:31:09Z

137

HTML encoding is simply replacing &, ", ', < and > chars with their entity equivalents. Order matters, if you don't replace the & chars first, you'll double encode some of the entities:

if (!String.prototype.encodeHTML) {
  String.prototype.encodeHTML = function () {
    return this.replace(/&/g, '&amp;')
               .replace(/</g, '&lt;')
               .replace(/>/g, '&gt;')
               .replace(/"/g, '&quot;')
               .replace(/'/g, '&apos;');
  };
}

_{As @Johan B.W. de Vries pointed out, this will have issues with the tag names, I would like to clarify that I made the assumption that this was being used for the value only}

Conversely if you want to decode HTML entities¹, make sure you decode & to & after everything else so that you don't double decode any entities:

if (!String.prototype.decodeHTML) {
  String.prototype.decodeHTML = function () {
    return this.replace(/&apos;/g, "'")
               .replace(/&quot;/g, '"')
               .replace(/&gt;/g, '>')
               .replace(/&lt;/g, '<')
               .replace(/&amp;/g, '&');
  };
}

_{1 just the basics, not including © to © or other such things}

As far as libraries are concerned. Underscore.js (or Lodash if you prefer) provides an _.escape method to perform this functionality.

edited Nov 19, 2013 at 20:31

answered Oct 27, 2011 at 16:09

zzzzBov

180k57 gold badges328 silver badges375 bronze badges

9 Comments

Ryan Over a year ago

This almost covers the 5 XML entities. Just need @apos;

Jonny Over a year ago

This looks like it is replacing the same string over and over again which could be performance heavy when handling lots of data. Any faster alternative?

zzzzBov Over a year ago

@Jonny, The regular expression is going to provide worse performance than the multiple calls to .replace(). In either case, you'd have to have a seriously huge amount of data to notice any significant issues. A faster alternative would be to benchmark your app and find the actual choke point (usually nested loops), rather than worry about something as negligible as this.

Jonny Over a year ago

I had 100-200 lines of data in a Google Spreadsheet. I was converting that to plists (xml) and had to replace those xml entities. I wrote a custom javascript function using the above code for that. It worked, but was very slow. The spreadsheet kind of choked at times but as it is just a "do once" step the speed didn't matter in the end.

austin_ce Over a year ago

I know this answer is old, but just to make clear for newcomers to JS: attaching random functions, that are not polyfills for some standardized proposal, to global prototypes is a bad idea.

|

lambshaanxy · Accepted Answer · 2012-02-23 00:04:55Z

26

If you have jQuery, here's a simple solution:

  String.prototype.htmlEscape = function() {
    return $('<div/>').text(this.toString()).html();
  };

Use it like this:

"<foo&bar>".htmlEscape(); -> "<foo&bar&gt"

answered Feb 23, 2012 at 0:04

lambshaanxy

23.2k12 gold badges72 silver badges93 bronze badges

3 Comments

avernet Over a year ago

I like this technique, for its "let the browser do it" attitude. Are there any downsides, maybe other than poorer performance, as this is going through the DOM API?

M J Over a year ago

Single and double quote are not escaped with this technique: $('<div/>').text('<&\'>"').html() -> "<&'>""

lambshaanxy Over a year ago

Single and double quotes generally don't need to be escaped.

sudhansu63 · Accepted Answer · 2015-09-22 08:26:50Z

8

you can use the below method. I have added this in prototype for easier access. I have also used negative look-ahead so it wont mess things, if you call the method twice or more.

Usage:

 var original = "Hi&there";
 var escaped = original.EncodeXMLEscapeChars();  //Hi&amp;there

Decoding is automaticaly handeled in XML parser.

Method :

//String Extenstion to format string for xml content.
//Replces xml escape chracters to their equivalent html notation.
String.prototype.EncodeXMLEscapeChars = function () {
    var OutPut = this;
    if ($.trim(OutPut) != "") {
        OutPut = OutPut.replace(/</g, "&lt;").replace(/>/g, "&gt;").replace(/"/g, "&quot;").replace(/'/g, "&#39;");
        OutPut = OutPut.replace(/&(?!(amp;)|(lt;)|(gt;)|(quot;)|(#39;)|(apos;))/g, "&amp;");
        OutPut = OutPut.replace(/([^\\])((\\\\)*)\\(?![\\/{])/g, "$1\\\\$2");  //replaces odd backslash(\\) with even.
    }
    else {
        OutPut = "";
    }
    return OutPut;
};

answered Sep 22, 2015 at 8:26

sudhansu63

6,2204 gold badges41 silver badges55 bronze badges

4 Comments

Steve Westbrook Over a year ago

Underappreciated excellent solution. Ensuring you won't wind up with the infamous &amp; string in your output is beautiful.

Lukas Liesis Over a year ago

With this code, you just edited all instances of String in all application, e.g. let a = 'foo' will be affected by this code. Better create helper function instead of extending prototype.

slikts Over a year ago

Please do not mutate builtin objects because it leads to conflicts and so is a very poor practice.

Timothy C. Quinn Dec 9, 2024 at 19:40

Regarding manipulating JS builtin objects. I agree completely that you should NEVER manipulate Object or Array but I have been writing very complex SPA's in JavaScript for 22 years and have never had issues with manipulating the String object. Eg. I have added a trim() and format() function for over 20 years with no regressions.

Jim Holmes · Accepted Answer · 2022-06-29 18:04:29Z

4

It just feels time for an update now that we have string interpolation, and a few other modernisations. And uses object lookup because it really should.

const escapeXml = (unsafe) =>
    unsafe.replace(/[<>&'"]/g, (c) => `&${({
        '<': 'lt',
        '>': 'gt',
        '&': 'amp',
        '\'': 'apos',
        '"': 'quot'
    })[c]};`);

answered Jun 29, 2022 at 18:04

Jim Holmes

973 bronze badges

1 Comment

Lewis Nakao Over a year ago

I really appreciate this code-golfed solution. Very simple and concise.

jordancpaul · Accepted Answer · 2017-11-09 01:35:23Z

2

I originally used the accepted answer in production code and found that it was actually really slow when used heavily. Here is a much faster solution (runs at over twice the speed):

   var escapeXml = (function() {
        var doc = document.implementation.createDocument("", "", null)
        var el = doc.createElement("temp");
        el.textContent = "temp";
        el = el.firstChild;
        var ser =  new XMLSerializer();
        return function(text) {
            el.nodeValue = text;
            return ser.serializeToString(el);
        };
    })();

console.log(escapeXml("<>&")); //&lt;&gt;&amp;

answered Nov 9, 2017 at 1:35

jordancpaul

2,9641 gold badge20 silver badges27 bronze badges

2 Comments

Lukas Liesis Over a year ago

This assumes that you have document object. I don't have it.

Zhe Over a year ago

Not possible in Node.js without using libraries like jsdom.

crown · Accepted Answer · 2018-05-11 07:23:00Z

2

maybe you can try this,

function encodeXML(s) {
  const dom = document.createElement('div')
  dom.textContent = s
  return dom.innerHTML
}

reference

answered May 11, 2018 at 7:23

crown

212 bronze badges

Comments

Stefan Steiger · Accepted Answer · 2020-01-29 14:46:39Z

2

Caution, all the regexing isn't good if you have XML inside XML.
Instead loop over the string once, and substitute all escape characters.
That way, you can't run over the same character twice.

function _xmlAttributeEscape(inputString)
{
    var output = [];

    for (var i = 0; i < inputString.length; ++i)
    {
        switch (inputString[i])
        {
            case '&':
                output.push("&amp;");
                break;
            case '"':
                output.push("&quot;");
                break;
            case "<":
                output.push("&lt;");
                break;
            case ">":
                output.push("&gt;");
                break;
            default:
                output.push(inputString[i]);
        }


    }

    return output.join("");
}

answered Jan 29, 2020 at 14:46

Stefan Steiger

83k70 gold badges405 silver badges461 bronze badges

1 Comment

Bigue Nique Over a year ago

Your observation about XML inside XML seems right to me. Being rigourous, you would want to re-escape ampersands of existing entities (eg. &amp;) if you don't want them to break up when decoded.

Justin · Accepted Answer · 2022-04-23 23:52:45Z

1

Adding on to ZZZZBov's answer, I find this a bit cleaner and easier to read:

const encodeXML = (str) =>
    str
        .replace(/&/g, '&amp;')
        .replace(/</g, '&lt;')
        .replace(/>/g, '&gt;')
        .replace(/"/g, '&quot;')
        .replace(/'/g, '&apos;');

Additionally, all five characters can be found here for example: https://www.sitemaps.org/protocol.html

Note that this only encodes values (as other have stated).

answered Apr 23, 2022 at 23:52

Justin

1,15918 silver badges34 bronze badges

Comments

V. Bozz · Accepted Answer · 2024-01-20 14:25:15Z

1

if something is escaped from before, you could try this since this will not double escape like many others

function escape(text) {
    return String(text).replace(/(['"<>&'])(\w+;)?/g, (match, char, escaped) => {
        if(escaped) {
            return match;
        }
        
        switch(char) {
            case '\'': return '&apos;';
            case '"': return '&quot;';
            case '<': return '&lt;';
            case '>': return '&gt;';
            case '&': return '&amp;';
        }
    });
}

edited Jan 20, 2024 at 14:25

V. Bozz

1,6321 gold badge23 silver badges42 bronze badges

answered Sep 18, 2019 at 12:12

Lostfields

1,4871 gold badge13 silver badges21 bronze badges

2 Comments

tbehunin Over a year ago

@ValerioBozz the case condition and the return value pair are mismatched. " is for double quotes (“) and ' is for single quotes (‘).

V. Bozz Over a year ago

Ah good catch, nice. I've proposed an edit.

Johan B.W. de Vries · Accepted Answer · 2011-10-27 16:10:27Z

0

Technically, &, < and > aren't valid XML entity name characters. If you can't trust the key variable, you should filter them out.

If you want them escaped as HTML entities, you could use something like http://www.strictly-software.com/htmlencode .

answered Oct 27, 2011 at 16:10

Johan B.W. de Vries

3912 silver badges7 bronze badges

Comments

Jean-Xavier Bardant · Accepted Answer · 2025-06-17 15:29:47Z

0

Faced to a similar problem, but on the client side for HTML, I found it easier to use the DOM instead of manipulating raw HTML.

let node=document.createTextNode(myTextToEscape);

The createTextNode function also exists for XML DOM API (e.g. xmldom package for node.js).

To get the “XML source code” of the node, you can add it to an element and call innerHTML on the element or alternatively you can use an XMLSerializer object.

let myXmlCode=new XMLSerializer().serializeToString(node);

answered Jun 17 at 15:29

Jean-Xavier Bardant

1064 bronze badges

Comments

Remi Guan · Accepted Answer · 2016-02-02 01:03:36Z

-2

This is simple:

sText = ("" + sText).split("<").join("&lt;").split(">").join("&gt;").split('"').join("&#34;").split("'").join("&#39;");

edited Feb 2, 2016 at 1:03

Remi Guan

22.5k17 gold badges68 silver badges90 bronze badges

answered Feb 1, 2016 at 20:01

Per Ghosh

5435 silver badges11 bronze badges

5 Comments

Sam Holder Over a year ago

in what world is this 'simple' compared to the replace methods above?

Per Ghosh Over a year ago

Simple to write, I didn't say that it was simpler compared to those above. It is just different

brettwhiteman Over a year ago

I'm stumped trying to think of a worse solution

Per Ghosh Over a year ago

@developerbmw if you don't want to add a method and don't use jquery, this is one of the best solutions

Per Ghosh Over a year ago

@developerbmw for readability, sometimes it's better to write code i a way that enable reading functionality from top to down. Dynamic languages is very easy to make unreadable fast. It depends on the situation. Also using components that is used in different scenarios and need its own logic. A small component may not need functions if logic is only being used in one method. I am a C++ developer and have been coding a lot of C. C is very easy to read and if you know why then you know my argument on this

Collectives™ on Stack Overflow

How to escape XML entities in JavaScript?

13 Answers 13

9 Comments

9 Comments

3 Comments

4 Comments

1 Comment

2 Comments

Comments

1 Comment

Comments

2 Comments

Comments

Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

13 Answers 13

9 Comments

9 Comments

3 Comments

4 Comments

1 Comment

2 Comments

Comments

1 Comment

Comments

2 Comments

Comments

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related