0

I have read a bunch of different StackOverflow answers and similar questions but none of them have been any help.

I am using Javascript to make an ajax request to get some data in json form.

I am receving json data such as the following:

\u0093title\u0094

Now I believe json is delivered in utf-8 by default, however these characters \u0093and \u0094 I believe are latin1 control characters meant to represent open and close speech marks.

The issue is when I make the GET with Javascript, the response ends up being something like:

“title”

I have tried doing encodeURIComponent( data.body )) and it produces the same result

This is extremely annoying, has anyone else encountered these issues before?

EDIT:

Imagine the following raw JSON data, this is what I am going to retrieve:

\u0093title\u0094

So for example, I run the following piece of jQuery/Javascript to get the above JSON data

                $.ajax({
                  type: "GET",
                  url: "myurl",
                  success: function(data){
                     console.log(data.body);
                  }
                });

The following is printed to the console (which looks fine, except it is omitting the control characters):

title

And then I encode and decode it, which should cancel out and change nothing:

console.log(decodeURIComponent(encodeURIComponent( data.body )))

Except this ends up printing the following:

“title”

Where it has picked up those extra  characters as well as the and , despite these not showing up in the console before the encode/decode step

5
  • You print “title” in your HTML yes? Commented Aug 7, 2015 at 23:25
  • Essentially yes, or if I print it to the debug console it also comes out like that Commented Aug 7, 2015 at 23:30
  • Have you set the charset to utf8? <meta charset="utf-8"> Commented Aug 7, 2015 at 23:37
  • Won't help, I am injecting the scripts after the page had loaded and rendered Commented Aug 7, 2015 at 23:40
  • U+0093 and U+0094 are non-printable characters. Are you sure you don't mean curved quotes? Commented Aug 8, 2015 at 0:35

1 Answer 1

1

First of all, Code Points U+0093 and U+0094 are not curved quotes, they are control characters for something else... (which to be quite honest, I have no idea). Curved Quotes code points are U+201C for and U+201D for . You still have another problem:

This pretty much looks like an example of incorrect decoding format. The program which is decoding the character saw: C2 93, the hex value of unicode point 0093. He's not assuming it's UTF-8 or he would have make a translation to unicode point 0093. Instead, it's using Windows Code Page-1252. Which makes: C2 into Â, 93 into and 94 into .

I could only think of 2 reasons why is it doing that but they all involve your browser. Is not really a problem with Javascript not using UTF-8, because this works:

document.getElementById('result').innerHTML = '\u201CHello\u201D';
<pre id="result"></pre>

The problem could be the HTTP response, your browser is reading the HTTP response as Windows Code Page-1252. The other thing it could be is because your browser is presenting data incorrectly (which now that i think of it, doesn't make much sense).

Try setting up the Content-Type of your HTTP response by sending this HTTP header:

Content-Type: application/json; charset=utf-8

And I insist that you put the:

<meta charset="utf-8">

To your document.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the long reply. I have updated my question to better reflect the nature of my question and hopefully give a better idea to what is happening - as I have this issue without even inserting anything into any HTML elements

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.