48

I have a string (from a CDATA element) that contains description of XML. I need to decode this string into a new string that displays the characters correctly using C#

Existing String:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><myreport xmlns="http://test.com/rules/client"><admin><ordernumber>123</ordernumber><state>NY</state></report></myreport>

String Wanted:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<myreport xmlns="http://test.com/rules/client">
<admin><ordernumber>123</ordernumber><state>NY</state></report></myreport>
3
  • but your existing string is invalid Commented Jul 20, 2011 at 4:46
  • @naveen: Surely that's just the contents of the string... Commented Jul 20, 2011 at 4:59
  • @Jeff Mercado: was using linqpad to write an answer when i saw the string is invalid and HtmlDecode won't work as it accepts string. was merley pointing out that our OP needs to escape " too. Commented Jul 20, 2011 at 5:19

7 Answers 7

52
  1. HttpUtility.HtmlDecode from System.Web
  2. WebUtility.HtmlDecode from System.Net
Sign up to request clarification or add additional context in comments.

Comments

47

You can use System.Net.WebUtility.HtmlDecode instead of HttpUtility.HtmlDecode

Useful if you don't want System.Web reference and prefer System.Net instead.

2 Comments

Thanks! This is really handy, as I want to target the .NET 4.0 Client Profile, but referencing System.Web would require me to target the full .NET 4.0 profile.
Much better answer IMO.
6

As Kirill and msarchet said, you can use HttpUtility.HtmlDecode from System.Web. It escapes pretty much anything correctly.

If you don't want to reference System.Web you might use some trick which supports all XML escaping but not HTML-specific escaping like &eacute;:

public static string XmlDecode(string value) {
    var xmlDoc = new XmlDocument();
    xmlDoc.LoadXml("<root>" + value + "</root>");
    return xmlDoc.InnerText;
}

You could also use a RegEx or simple string.Replace but it would only support basic XML escaping. Things like &#x410; or &eacute; are examples that would be harder to support.

2 Comments

I wouldn't parse XML/HTML with RegEx: stackoverflow.com/questions/1732348/…
(I liked the XmlDocument-method though) +1
1

HttpUtility.HtmlDecode(xmlString) will solve this issue

Comments

0

You can use HTML.Raw. That way the markup is not encoded.

1 Comment

Can you provide some sample code to better explain your answer?
-1

You just need to replace the scaped characters with their originals.

string stringWanted= existingString.Replace("&lt;", "<")
                                                   .Replace("&amp;", "&")
                                                   .Replace("&gt;", ">")
                                                   .Replace("&quot;", "\"")
                                                   .Replace("&apos;", "'");

1 Comment

Well that is very strange. I've just produced an example that I was expecting to demonstrate the problem, and it works precisely as desired. What makes it strange is that I know this exact situation is responsible for an XML parsing error in a codebase I maintain that I fixed yesterday. At least, I think it's exactly the same. I'll cancel the downvote and remove my original comment until I get a chance to check.
-2

You might also consider the static parse method from XDocument. I'm not sure how it compares to others mentioned here, but it seems to parse these strings well.

Once you get the resulting XDocument, you could turn around with ToString to get the string back:

string parsedString = XDocument.Parse("<My XML />").ToString();

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.