Pythonically check if a variable name is valid

Question

tldr; see the final line; the rest is just preamble.

I am developing a test harness, which parses user scripts and generates a Python script which it then runs. The idea is for non-techie folks to be able to write high-level test scripts.

I have introduced the idea of variables, so a user can use the LET keyword in his script. E.g. LET X = 42, which I simply expand to X = 42. They can then use X later in their scripts - RELEASE CONNECTION X

But what if someone writes LET 2 = 3? That's going to generate invalid Python.

If I have that X in a variable variableName, then how can I check whether variableName is a valid Python variable?

On the side: Why do you think "LET X = 42" is easier for "non-techie folks" than "X = 42"? — timgeb
– timgeb, Commented Mar 31, 2016 at 10:36
One option is to use a regex. See Regular expression to confirm whether a string is a valid identifier in Python — PM 2Ring
– PM 2Ring, Commented Mar 31, 2016 at 10:41
@PM2Ring - Note that that's for Python 2. It's less simple for Python 3 (also see here and here. — TigerhawkT3
– TigerhawkT3, Commented Mar 31, 2016 at 10:44

mhawke · Accepted Answer · 2016-03-31 11:07:09Z

65

In Python 3 you can use str.isidentifier() to test whether a given string is a valid Python identifier/name.

>>> 'X'.isidentifier()
True
>>> 'X123'.isidentifier()
True
>>> '2'.isidentifier()
False
>>> 'while'.isidentifier()
True

The last example shows that you should also check whether the variable name clashes with a Python keyword:

>>> from keyword import iskeyword
>>> iskeyword('X')
False
>>> iskeyword('while')
True

So you could put that together in a function:

from keyword import iskeyword

def is_valid_variable_name(name):
    return name.isidentifier() and not iskeyword(name)

Another option, which works in Python 2 and 3, is to use the ast module:

from ast import parse

def is_valid_variable_name(name):
    try:
        parse('{} = None'.format(name))
        return True
    except SyntaxError, ValueError, TypeError:
        return False

>>> is_valid_variable_name('X')
True
>>> is_valid_variable_name('123')
False
>>> is_valid_variable_name('for')
False
>>> is_valid_variable_name('')
False
>>> is_valid_variable_name(42)
False

This will parse the assignment statement without actually executing it. It will pick up invalid identifiers as well as attempts to assign to a keyword. In the above code None is an arbitrary value to assign to the given name - it could be any valid expression for the RHS.

edited Mar 31, 2016 at 11:07

answered Mar 31, 2016 at 10:49

mhawke

87.5k10 gold badges122 silver badges142 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Padraic Cunningham Over a year ago

compile('{} = None'.format(name), "<string>", "exec") and return True after should be enough

mhawke Over a year ago

@PadraicCunningham: Thanks. Either works and ast.parse() calls compile() anyway. I think that it's a little cleaner with ast.parse() because there are fewer arguments, although it does require an import.

vaultah Over a year ago

For what it's worth, is_valid_variable_name('a = b'), is_valid_variable_name('[]'), is_valid_variable_name('*a') will all return True.

mhawke Over a year ago

@vaultah: it's worth a great deal and thanks for finding the flaw in my solution. Worth also noting that this problem only affects the ast.parse() solution. AFAIK the first solution still works as expected.

vaultah Over a year ago

@mhawke: yes, the first solution should work fine. Sorry, I should have mentioned that

|

mtraceur · Accepted Answer · 2022-07-18 18:58:24Z

3

EDIT: this is wrong and implementation dependent - see comments.

Just have Python do its own check by making a dictionary with the variable holding the name as the key and splatting it as keyword arguments:

def _dummy_function(**kwargs):
    pass

def is_valid_variable_name(name):
    try:
        _dummy_function(**{name: None})
        return True
    except TypeError:
        return False

Notably, TypeError is consistently raised whenever a dict splats into keyword arguments but has a key which isn't a valid function argument, and whenever a dict literal is being constructed with an invalid key, so this will work correctly on anything you pass to it.

edited Jul 18, 2022 at 18:58

answered Jan 29, 2018 at 18:47

mtraceur

3,84528 silver badges40 bronze badges

3 Comments

Christoph Boeddeker Over a year ago

**kwargs can contain non-valid variable names. E.g. is_valid_variable_name('[]') returned True. I was not able to find any string, where this function returns False. Might be different in python 2.

mtraceur Over a year ago

@ChristophBöddeker wild. I have strong memories of this working to reject arguments. But at least on 3.10 it behaves as you describe.

mtraceur Over a year ago

I found a Python mailing list discussion which indicates this was always the case. So I guess I was just wrong and didn't properly test it with strings. (It does reject non-strings, but that's not what this question is about.)

ptrk · Accepted Answer · 2016-03-31 11:00:48Z

2

I don't think you need the exact same naming syntax as python itself. Would rather go for a simple regexp like:

\w+

to make sure it's something alphanumeric, and then add a prefix to keep away from python's own syntax. So the non-techie user's declaration:

LET return = 12

should probably become after your parsing:

userspace_return = 12
or
userspace['return'] = 12

answered Mar 31, 2016 at 11:00

ptrk

1,8401 gold badge19 silver badges24 bronze badges

Comments

Artyer · Accepted Answer · 2018-01-02 19:57:23Z

In Python 3, as above, you can simply use str.isidentifier. But in Python 2, this does not exist.

The tokenize module has a regex for names (identifiers): tokenize.Name. But I couldn't find any documentation for it, so it may not be available everywhere. It is simply r'[a-zA-Z_]\w*'. A single $ after it will let you test strings with re.match.

The docs say that an identifier is defined by this grammar:

identifier ::=  (letter|"_") (letter | digit | "_")*
letter     ::=  lowercase | uppercase
lowercase  ::=  "a"..."z"
uppercase  ::=  "A"..."Z"
digit      ::=  "0"..."9"

Which is equivalent to the regex above. But we should still import tokenize.Name in case this ever changes. (Which is very unlikely, but maybe in older versions of Python it was different?)

And to filter out keywords, like pass, def and return, use keyword.iskeyword. There is one caveat: None is not a keyword in Python 2, but still can't be assigned to. (keyword.iskeyword('None') in Python 2 is False).

So:

import keyword

if hasattr(str, 'isidentifier'):
    _isidentifier = str.isidentifier
else:
    import re
    _fallback_pattern = '[a-zA-Z_][a-zA-Z0-9_]*'
    try:
        import tokenize
    except ImportError:
        _isidentifier = re.compile(_fallback_pattern + '$').match
    else:
        _isidentifier = re.compile(
            getattr(tokenize, 'Name', _fallback_pattern) + '$'
        ).match

    del _fallback_pattern


def isname(s):
    return bool(_isidentifier(s)) and not keyword.iskeyword(s) and s != 'None'

snakecharmerb · Accepted Answer · 2016-03-31 10:34:44Z

1

You could try a test assignment and see if it raises a SyntaxError:

>>> 2fg = 5
  File "<stdin>", line 1
    2fg = 5
      ^
SyntaxError: invalid syntax

answered Mar 31, 2016 at 10:34

snakecharmerb

57.1k13 gold badges136 silver badges200 bronze badges

1 Comment

mtraceur Over a year ago

This assumes you are able to evaluate the name in the Python interpreter, and is unsafe for programmatic checking in the general case (import os; os.replace(malicious_file, important_file); foo can have = 5 appended to it and still execute just fine).

xiº · Accepted Answer · 2016-03-31 10:36:11Z

1

You could use exceptions handling and catch actually NameError and SyntaxError. Test it inside try/except block and inform user if there is some invalid input.

edited Mar 31, 2016 at 10:36

answered Mar 31, 2016 at 10:33

xiº

4,7173 gold badges31 silver badges43 bronze badges

4 Comments

Mawg Over a year ago

The this is that I want to validate when I am generating the code. Now when it is run, which could be much later.

Mawg Over a year ago

Aha, I could test it before generating the code! Well done! Thanks.

PM 2Ring Over a year ago

A SyntaxError will prevent the script from running. The only way (AFAIK) to actually catch a SyntaxError is if you import a Python file that contains a syntax error.

PM 2Ring Over a year ago

@timgeb: Ah, of course. I wasn't thinking about eval or exec, since I tend to avoid using them, especially on arbitrary input from "non-techie folks". :)

Collectives™ on Stack Overflow

Pythonically check if a variable name is valid

6 Answers 6

10 Comments

3 Comments

Comments

Comments

1 Comment

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

10 Comments

3 Comments

Comments

Comments

1 Comment

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related