16

How determine if a string has been encoded programmatically in C#?

Lets for example string:

<p>test</p>

I would like have my logic understand that this value it has been encoded.. Any ideas? Thanks

0

8 Answers 8

62

You can use HttpUtility.HtmlDecode() to decode the string, then compare the result with the original string. If they're different, the original string was probably encoded (at least, the routine found something to decode inside):

public bool IsHtmlEncoded(string text)
{
    return (HttpUtility.HtmlDecode(text) != text);
}
Sign up to request clarification or add additional context in comments.

10 Comments

Good solution very elegant.... but what about if I do not want to compare two strings and I would like to know if the string is encoded? I appreciate your help on this thanks!
AFAIK, that's what he said: IsHtmlEncoded("<p>test</p>") should give true, so it's encoded.
@GIbboK: - I want to drink a cup of coffee. - Use a cup and coffee! - But what about if I do not want to use cup, coffee and drink it, but just want to have it in my stomach?
Very bad if you're trying to test user-entered values. If I enter "><script>alert('XSS');</script>&amp; then it'll decode &amp;, the strings won't match, and it'll be counted as encoded.
Downvoted this because it simple is not effective in many cases. Tarka pointed out one scenario and I can confirm it.
|
10

Strictly speaking that's not possible. What the string contains might actually be the intended text, and the encoded version of that would be &amp;lt;p&amp;gt;test&amp;lt;/p&amp;gt;.

You could look for HTML entities in the string, and decode it until there are no left, but it's risky to decode data that way, as it's assuming things that might not be true.

Comments

5

this is my take on it... if the user passes in partially encoded text, this'll catch it.

private bool EncodeText(string val)
{
     string decodedText = HttpUtility.HtmlDecode(val);
     string encodedText = HttpUtility.HtmlEncode(decodedText);

     return encodedText.Equals(val, StringComparison.OrdinalIgnoreCase);
}

1 Comment

This is the best trick!
2

I use the NeedsEncoding() method below to determine whether a string needs encoding.

Results 
-----------------------------------------------------
b               -->      NeedsEncoding = True
&lt;b>          -->      NeedsEncoding = True
<b>             -->      NeedsEncoding = True
&lt;b&lt;       -->      NeedsEncoding = False
&quot;          -->      NeedsEncoding = False

Here are the helper methods, I split it into two methods for clarity. Like Guffa says it is risky and hard to produce a bullet proof method.

public static bool IsEncoded(string text) 
{
    // below fixes false positive &lt;<> 
    // you could add a complete blacklist, 
    // but these are the ones that cause HTML injection issues
    if (text.Contains("<")) return false;
    if (text.Contains(">")) return false;
    if (text.Contains("\"")) return false;
    if (text.Contains("'")) return false;
    if (text.Contains("script")) return false;

    // if decoded string == original string, it is already encoded
    return (System.Web.HttpUtility.HtmlDecode(text) != text);
}

public static bool NeedsEncoding(string text) 
{
    return ! IsEncoded(text);
}

Comments

0

A simple way of detecting this would be to check for characters that are not allowed in an encoded string, such as < and >.

1 Comment

But only if there is a guarantee that this would be in an unencoded string.
0

All I can suggest is that you replace known encoded sections with the decoded string.

replace("&lt;", "<")

Comments

0

I'm doing .NET Core 2.0 development and I'm using System.Net.WebUtility.HtmlDecode, but I have a situation where strings being processed in a microservice might have an indeterminate number of encodings performed on some strings. So I put together a little recursive method to handle this:

public string HtmlDecodeText(string value, int decodingCount = 0) 
{
  // If decoded text equals the original text, then we know decoding is done;
  // Don't go past 4 levels of decoding to prevent possible stack overflow,
  // and because we don't have a valid use case for that level of
  // multi-decoding.

  if (decodingCount < 0) 
  {
    decodingCount = 1;
  }

  if (decodingCount >= 4) 
  {
    return value;
  }

  var decodedText = WebUtility.HtmlDecode(value);

  if (decodedText.Equals(value, StringComparison.OrdinalIgnoreCase)) 
  {
    return value;
  }

  return HtmlDecodeText(decodedText, ++decodingCount);
}

And here I called the method on each item in a list where strings were encoded:

result.FavoritesData.folderMap.ToList().ForEach(x => x.Name = HtmlDecodeText(x.Name));

Comments

-1

Try this answer: Determine a string's encoding in C#

Another code project might be of help.. http://www.codeproject.com/KB/recipes/DetectEncoding.aspx

You could also use regex to match on the string content...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.