1

I have a field message with strings like <pika> [SOME_TEXT_WITH|ACTION] And other stuff....

I wish to capture what's inside the brackets. I use the following form:

SELECT 
  substring(message FROM '%> \[#"[A-Z_\|]+#"\] %' FOR '#') AS my_info 
FROM my_table;

But it always fail with the same ennoying error message: «Invalid regular expression: parentheses () not balanced». What do I do wrong ?

2 Answers 2

2

Personally, I'd use a perl-compatible modern regexp instead of the horrid POSIX-esque regexps:

regress=> SELECT (regexp_matches('<pika> [SOME_TEXT_WITH|ACTION] And other stuff...', '\[(.*?)\]'))[1];
    regexp_matches     
-----------------------
 SOME_TEXT_WITH|ACTION
(1 row)

If you want to use the POSIX syntax you have to use the same escape consistently, not \ in some places and # in others. eg:

regress=> SELECT substring(
            '<pika> [SOME_TEXT_WITH|ACTION] And other stuff...' 
            FROM '%#"#[%#]#"%' FOR '#'
          );
        substring        
-------------------------
 [SOME_TEXT_WITH|ACTION]
(1 row)

The docs don't make it very clear that the capture operator is actually <ESCAPECHAR>", not #" specifically. This is equally valid, using a regular backslash escape:

regress=> SELECT substring(
              '<pika> [SOME_TEXT_WITH|ACTION] And other stuff...' 
              FROM '%\"\[%\]\"%' FOR '\'
          );
        substring        
-------------------------
 [SOME_TEXT_WITH|ACTION]
(1 row)

The cause of the odd error is that PostgreSQL translates the POSIX SIMILAR TO style expression into a real regular expression under the covers. Your mixed-escapes regex:

'%> \[#"[A-Z_\|]+#"\] %' FOR '#'

is being turned into something like:

'.*> \\[([A-Z_\\|]+)\\] .*'

resulting in:

regress=> SELECT (regexp_matches('<pika> [SOME_TEXT_WITH|ACTION] And other stuff...', '.*> \\[([A-Z_\\|]+)\\] .*'))[1];
ERROR:  invalid regular expression: parentheses () not balanced
Sign up to request clarification or add additional context in comments.

5 Comments

Doesn't regexp_matches return an array ?
@greg Yes, that's why I've written (regexp_matches(...))[1]
Yuuu ... I did not know we could do that ... awesome ! Thx
@greg Also showed how to do it with icky POSIX regexp, ie what was wrong.
Thank you so much for the escape char trick, It is just presented as delimiter in the documentation with no mention the given char becomes the escape char.
1

I think the following does what you want:

SELECT substring(cast(message as varchar(1000)) FROM '.*\[([A-Z_\|]*)\].*'
                )
FROM my_table;

2 Comments

Your solution is working as well. I do not well understand why the CAST makes the regular expression to work. substring(CAST(message AS varchar) FROM ' <#"[ 0-9a-zA-Z,:]+#"> %' FOR '#') AS my_info works. Edit : I tested with a wrong line. Craig is right about the escape char.
@greg . . . I needed it when I tested this in SQLFiddle.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.