4

I have a table where a particular string field often includes unicode for single and double quotes inside of it: \u0027 and \u0022 respectively. So it turns out, I actually need them escaped even more. I need to put an extra \ in front of them.

For example, I need to change \u0027Hello, world\u0027 to \\u0027Hello, world\\u0027

What kind of SQL could perform this kind of an update on the table for all records?

4
  • FWIW I'm using postgresql. Bonus points if you can also show me how to force all inserts and updates to afterwards perform the same modification, without ending up with three slashes. Commented Jun 2, 2011 at 22:47
  • 2
    You're better off storing the values in the database in Unicode and perform the escaping as and when needed - it'll be less complex at the end of the day. Commented Jun 2, 2011 at 23:03
  • 1
    This sounds really suspect to me. Why do you “need” to have encoded data in the database? This usually implies the code putting the data in or getting it back out has some serious problems. Data should normally be kept in raw unescaped text format. Commented Jun 2, 2011 at 23:05
  • @will I'm allowed to mess with the db, but not with any of the software that retrieves and handles the data. I'm aware it has serious problems, but it's sadly not in my power to fix those problems. Commented Jun 2, 2011 at 23:37

2 Answers 2

10

If you really need this, then you can use such RE:

UPDATE table SET c = regexp_replace(c, '[^\\]\\(u\d{4})', '\\\\\1', 'g');

Make sure that standard_conforming_strings is enabled and regex_flavor is set to advanced.

SHOW standard_conforming_strings;
 standard_conforming_strings 
-----------------------------
 on
(1 row)

Replacement string '\\\\\1' means two following backslashes \\ and \1 represent first (reporting) parenthesized subexpression (that is, 'u' concatenated with four digits from pattern).

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. I made two variants of this: one to handle a string beginning with a unicode escape sequence, and another that will preserve the character preceding the \u (the version you provide consumes it). I also had to double the number of backslashes since our db doesn't use standard_conforming_strings. Icky stuff, but it worked.
@Dan: You can also use dollar quoting $$pattern$$ to avoid that doubled backslashes postgresql.org/docs/9.0/static/…
1

An UPDATE statement with SET yourcolumn = REPLACE(yourcolumn, '\u0027', '\\u0027') ought to do it. Try the below first to check that it works before doing a mass update.

SELECT REPLACE('\u0027', '\u0027', '\\u0027')

2 Comments

+1 This does look useful, but it appears it is incapable of detecting whether or not the double-escape is already performed.
...and therein lies the problem, Dan - please take a moment to seriously consider storing the data in a more appropriate format.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.