1

I am searching a regex to validate if a string could be a valid SQL column name.

I would like to use PCRE syntax.

Up to now I found this:

[\w-]+

But I think this is not enough. I have seen the / too (in SAP).

AFAIK the spec is closed source (you need to pay for it).

From the docs (Python re):

\w Matches Unicode word characters; this includes most characters that can be part of a word in any language, as well as numbers and the underscore. If the ASCII flag is used, only [a-zA-Z0-9_] is matched.

How does the regex look like to validate SQL column names?

The string should be able to used like this my_column.

AFAIK reserved words are valid, since you can use them like this:

select * from my_table where "where" = 'here'

"where" is the name of a column. The regex does not need to care for reserved words.

6
  • What do you mean "validate SQL column names"? Almost any string can be a column name if it is escaped. Commented Oct 23, 2018 at 10:28
  • 1
    And many "valid" strings need escaping anyway if they are reserved words. Commented Oct 23, 2018 at 10:36
  • @ErwinBrandstetter I updated the question (concerning reserved words). Commented Oct 23, 2018 at 10:59
  • Do you mean valid in standard SQL or valid in PostgreSQL? Commented Oct 23, 2018 at 11:12
  • @ErwinBrandstetter: yes of course. Commented Oct 23, 2018 at 12:42

2 Answers 2

3

If we follow the PostgreSQL documentation:

SQL identifiers and key words must begin with a letter (a-z, but also letters with diacritical marks and non-Latin letters) or an underscore (_). Subsequent characters in an identifier or key word can be letters, underscores, digits (0-9), or dollar signs ($). Note that dollar signs are not allowed in identifiers according to the letter of the SQL standard [...]

we could write a regular expression for identifiers like this:

^([[:alpha:]_][[:alnum:]_]*|("[^"]*")+)$

The second branch of the regular expression takes care of quoted identifiers.

Sign up to request clarification or add additional context in comments.

Comments

3

The manual clarifies:

SQL identifiers and key words must begin with a letter (a-z, but also letters with diacritical marks and non-Latin letters) or an underscore (_). Subsequent characters in an identifier or key word can be letters, underscores, digits (0-9), or dollar signs ($). Note that dollar signs are not allowed in identifiers according to the letter of the SQL standard, so their use might render applications less portable. The SQL standard will not define a key word that contains digits or starts or ends with an underscore, so identifiers of this form are safe against possible conflict with future extensions of the standard.

The system uses no more than NAMEDATALEN-1 bytes of an identifier; longer names can be written in commands, but they will be truncated. By default, NAMEDATALEN is 64 so the maximum identifier length is 63 bytes. If this limit is problematic, it can be raised by changing the NAMEDATALEN constant in src/include/pg_config_manual.h.

And:

There is a second kind of identifier: the delimited identifier or quoted identifier. It is formed by enclosing an arbitrary sequence of characters in double-quotes ("). [...]

Quoted identifiers can contain any character, except the character with code zero. (To include a double quote, write two double quotes.) This allows constructing table or column names that would otherwise not be possible, such as ones containing spaces or ampersands. The length limitation still applies.

There is more, you can even use escaped unicode characters like: U&"d\0061t\+000061". Read the whole chapter.

So any character, except the character with code zero is allowed in a valid identifier, once the name is double-quoted. And without double-quotes, even simple strings like 'select' may be invalid if they happen to be reserved words. (The concept of reserved words is an unfortunate one, set by the SQL standard, hard to change now.)

You might just let Postgres do the work, using quote_ident():

SELECT quote_ident('0of') = '0of';

Quotes are added only if necessary.

The expression returns true for valid identifiers. Or just used the result of quote_ident('$identifier') to get a legal name in either case (quoted if necessary).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.