31

tldr; see the final line; the rest is just preamble.


I am developing a test harness, which parses user scripts and generates a Python script which it then runs. The idea is for non-techie folks to be able to write high-level test scripts.

I have introduced the idea of variables, so a user can use the LET keyword in his script. E.g. LET X = 42, which I simply expand to X = 42. They can then use X later in their scripts - RELEASE CONNECTION X

But what if someone writes LET 2 = 3? That's going to generate invalid Python.

If I have that X in a variable variableName, then how can I check whether variableName is a valid Python variable?

4
  • 9
    On the side: Why do you think "LET X = 42" is easier for "non-techie folks" than "X = 42"? Commented Mar 31, 2016 at 10:36
  • One option is to use a regex. See Regular expression to confirm whether a string is a valid identifier in Python Commented Mar 31, 2016 at 10:41
  • 1
    @PM2Ring - Note that that's for Python 2. It's less simple for Python 3 (also see here and here. Commented Mar 31, 2016 at 10:44
  • @timgeb the answer to that is quite Basic :-) Commented Jan 3, 2018 at 11:14

6 Answers 6

65

In Python 3 you can use str.isidentifier() to test whether a given string is a valid Python identifier/name.

>>> 'X'.isidentifier()
True
>>> 'X123'.isidentifier()
True
>>> '2'.isidentifier()
False
>>> 'while'.isidentifier()
True

The last example shows that you should also check whether the variable name clashes with a Python keyword:

>>> from keyword import iskeyword
>>> iskeyword('X')
False
>>> iskeyword('while')
True

So you could put that together in a function:

from keyword import iskeyword

def is_valid_variable_name(name):
    return name.isidentifier() and not iskeyword(name)

Another option, which works in Python 2 and 3, is to use the ast module:

from ast import parse

def is_valid_variable_name(name):
    try:
        parse('{} = None'.format(name))
        return True
    except SyntaxError, ValueError, TypeError:
        return False

>>> is_valid_variable_name('X')
True
>>> is_valid_variable_name('123')
False
>>> is_valid_variable_name('for')
False
>>> is_valid_variable_name('')
False
>>> is_valid_variable_name(42)
False

This will parse the assignment statement without actually executing it. It will pick up invalid identifiers as well as attempts to assign to a keyword. In the above code None is an arbitrary value to assign to the given name - it could be any valid expression for the RHS.

Sign up to request clarification or add additional context in comments.

10 Comments

compile('{} = None'.format(name), "<string>", "exec") and return True after should be enough
@PadraicCunningham: Thanks. Either works and ast.parse() calls compile() anyway. I think that it's a little cleaner with ast.parse() because there are fewer arguments, although it does require an import.
For what it's worth, is_valid_variable_name('a = b'), is_valid_variable_name('[]'), is_valid_variable_name('*a') will all return True.
@vaultah: it's worth a great deal and thanks for finding the flaw in my solution. Worth also noting that this problem only affects the ast.parse() solution. AFAIK the first solution still works as expected.
@mhawke: yes, the first solution should work fine. Sorry, I should have mentioned that
|
3

EDIT: this is wrong and implementation dependent - see comments.

Just have Python do its own check by making a dictionary with the variable holding the name as the key and splatting it as keyword arguments:

def _dummy_function(**kwargs):
    pass

def is_valid_variable_name(name):
    try:
        _dummy_function(**{name: None})
        return True
    except TypeError:
        return False

Notably, TypeError is consistently raised whenever a dict splats into keyword arguments but has a key which isn't a valid function argument, and whenever a dict literal is being constructed with an invalid key, so this will work correctly on anything you pass to it.

3 Comments

**kwargs can contain non-valid variable names. E.g. is_valid_variable_name('[]') returned True. I was not able to find any string, where this function returns False. Might be different in python 2.
@ChristophBöddeker wild. I have strong memories of this working to reject arguments. But at least on 3.10 it behaves as you describe.
I found a Python mailing list discussion which indicates this was always the case. So I guess I was just wrong and didn't properly test it with strings. (It does reject non-strings, but that's not what this question is about.)
2

I don't think you need the exact same naming syntax as python itself. Would rather go for a simple regexp like:

\w+

to make sure it's something alphanumeric, and then add a prefix to keep away from python's own syntax. So the non-techie user's declaration:

LET return = 12

should probably become after your parsing:

userspace_return = 12
or
userspace['return'] = 12

Comments

2

In Python 3, as above, you can simply use str.isidentifier. But in Python 2, this does not exist.

The tokenize module has a regex for names (identifiers): tokenize.Name. But I couldn't find any documentation for it, so it may not be available everywhere. It is simply r'[a-zA-Z_]\w*'. A single $ after it will let you test strings with re.match.

The docs say that an identifier is defined by this grammar:

identifier ::=  (letter|"_") (letter | digit | "_")*
letter     ::=  lowercase | uppercase
lowercase  ::=  "a"..."z"
uppercase  ::=  "A"..."Z"
digit      ::=  "0"..."9"

Which is equivalent to the regex above. But we should still import tokenize.Name in case this ever changes. (Which is very unlikely, but maybe in older versions of Python it was different?)

And to filter out keywords, like pass, def and return, use keyword.iskeyword. There is one caveat: None is not a keyword in Python 2, but still can't be assigned to. (keyword.iskeyword('None') in Python 2 is False).

So:

import keyword

if hasattr(str, 'isidentifier'):
    _isidentifier = str.isidentifier
else:
    import re
    _fallback_pattern = '[a-zA-Z_][a-zA-Z0-9_]*'
    try:
        import tokenize
    except ImportError:
        _isidentifier = re.compile(_fallback_pattern + '$').match
    else:
        _isidentifier = re.compile(
            getattr(tokenize, 'Name', _fallback_pattern) + '$'
        ).match

    del _fallback_pattern


def isname(s):
    return bool(_isidentifier(s)) and not keyword.iskeyword(s) and s != 'None'

Comments

1

You could try a test assignment and see if it raises a SyntaxError:

>>> 2fg = 5
  File "<stdin>", line 1
    2fg = 5
      ^
SyntaxError: invalid syntax

1 Comment

This assumes you are able to evaluate the name in the Python interpreter, and is unsafe for programmatic checking in the general case (import os; os.replace(malicious_file, important_file); foo can have = 5 appended to it and still execute just fine).
1

You could use exceptions handling and catch actually NameError and SyntaxError. Test it inside try/except block and inform user if there is some invalid input.

4 Comments

The this is that I want to validate when I am generating the code. Now when it is run, which could be much later.
Aha, I could test it before generating the code! Well done! Thanks.
A SyntaxError will prevent the script from running. The only way (AFAIK) to actually catch a SyntaxError is if you import a Python file that contains a syntax error.
@timgeb: Ah, of course. I wasn't thinking about eval or exec, since I tend to avoid using them, especially on arbitrary input from "non-techie folks". :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.