2

The Unicode character 𠮵 given by point 134069, has the HTML escape 𠮵

Is there a (preferably native) way to get the HTML escapes for character entities from Javascript?

2 Answers 2

2

You can get both the point and hex values of the char like this:

var codePoint = '𠮵'.codePointAt(0); //codePoint = 134069
var hexValue = '𠮵'.codePointAt(0).toString(16); //hexValue = 20bb5
var htmlEscape = '&#x' + hexValue + ';'; //htmlEscape = 𠮵

Here is a working example:

$('#doIt').click(function() {
  $('#outputHex').html($('#inputText').val().codePointAt(0).toString(16));
  $('#outputString').html('&#x' + $('#inputText').val().codePointAt(0).toString(16) + ';');
  $('#outputChar').html('&#x' + $('#inputText').val().codePointAt(0).toString(16) + ';');
});
code {
  display: block;
  padding: 4px;
  background-color: #EFEFEF;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea id="inputText"></textarea>
<button id="doIt">do it</button>

<h3>result</h3>
<code id="outputHex"></code>
<code id="outputString"></code>
<code id="outputChar"></code>

One more thing, codePointAt is an ES6 function and isn't supported in older browsers. In case the browser blocks the code from running here: JSFiddle Example

Sign up to request clarification or add additional context in comments.

2 Comments

This is a good idea however I do not think that it works for all HTML entities (i.e. the & is more complicated correct)?
No. this should work for any character. There are special characters, like & that have shortcuts, i.e. &amp; but this will also work. I will update my answer with a JFiddle example.
2

Here is a function that converts all non-ASCII7 characters, and <, >, & to HTML entities:

function htmlEntities(s) {
    return Array.from(s).map(function (c) {
        return c.codePointAt(0) < 128 && '<&>'.indexOf(c) == -1 
            ? c 
            : '&#x' + c.codePointAt(0).toString(16) + ';';
    }).join('');
}

var s = 'This is \u{20BB5}, a special character & encoded in HTML.';
document.body.innerHTML = htmlEntities(s);

Be aware that in Javascript strings, extended unicode characters are counted as two characters (for example in length). The ES6 constructs like Array.from, [...s] make sure you get the right chunks.

2 Comments

Wow, this will needlessly bloat HTML for many languages. 99.9% of non-ASCII characters do not need to be encoded as entities. For example, this would make Chinese text take at least 5× more HTML bytes. To anyone reading this, be sure that this is really the solution you need. It probably isn't.
@AlanH., it seemed what the OP was asking -- they gave as example 𠮵, which indeed does not need to be encoded as entity, but yet the OP asked for it. It would be more useful if you would put such a comment to the OP...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.