81

I know I can count the leading spaces in a string with this:

>>> a = "   foo bar baz qua   \n"
>>> print "Leading spaces", len(a) - len(a.lstrip())
Leading spaces 3
>>>

But is there a more pythonic way?

3
  • 7
    Looks pretty pythonic to me already. Commented Nov 30, 2012 at 16:10
  • Unpleasant -- but different -- way: a.count(" ", 0, a.index(a.split(None, 1)[0])) Commented Nov 30, 2012 at 16:17
  • 3
    Bear in mind that lstrip will remove tabs and other whitespace characters as well as spaces. Commented Nov 30, 2012 at 16:50

8 Answers 8

121

Your way is pythonic but incorrect, it will also count other whitespace chars, to count only spaces be explicit a.lstrip(' '). Compare

a = "   \r\t\n\tfoo bar baz qua   \n"
print("Leading spaces", len(a) - len(a.lstrip()))
>>> Leading spaces 7

and

print("Leading spaces", len(a) - len(a.lstrip(' '))
>>> Leading spaces 3
Sign up to request clarification or add additional context in comments.

Comments

26

You could use itertools.takewhile

sum( 1 for _ in itertools.takewhile(str.isspace,a) )

And demonstrating that it gives the same result as your code:

>>> import itertools
>>> a = "    leading spaces"
>>> print sum( 1 for _ in itertools.takewhile(str.isspace,a) )
4
>>> print "Leading spaces", len(a) - len(a.lstrip())
Leading spaces 4

I'm not sure whether this code is actually better than your original solution. It has the advantage that it doesn't create more temporary strings, but that's pretty minor (unless the strings are really big). I don't find either version to be immediately clear about that line of code does, so I would definitely wrap it in a nicely named function if you plan on using it more than once (with appropriate comments in either case).

7 Comments

I was trying to figure out exactly this, only without itertools. I really need to learn itertools...
On my system, (Python 2.7.10 32 bit running on Windows), lstrip() is 3.5x as fast as itertools.
@ChaimG -- I bet that we could construct some strings for which that isn't the case (e.g. if the string is really long and only has one or two leading spaces). For many common cases however, I agree that lstrip will be much faster.
@mgilson -- Correct. With the string: a = ' ' + 'a'*100000000, itertools is 67k times faster. I wonder why? Is it because lstrip() creates a copy of the string?
@ChaimG -- that's exactly why :-). At one point, I assumed that lstrip() wouldn't create a new string -- Immutability should make that possible. However, I made that statement on a google mailing list once and was corrected by Alex Martelli IIRC :-). I'm not sure why they don't re-use the old string, but it might be because in a lot of cases that would prevent a large string from getting deallocated.
|
18

Just for variety, you could theoretically use regex. It's a little shorter, and looks nicer than the double call to len().

>>> import re
>>> a = "   foo bar baz qua   \n"
>>> re.search('\S', a).start() # index of the first non-whitespace char
3

Or alternatively:

>>> re.search('[^ ]', a).start() # index of the first non-space char
3

But I don't recommend this; according to a quick test I did, it's much less efficient than len(a)-len(lstrip(a)).

Comments

6

I recently had a similar task of counting indents, because of which I wanted to count tab as four spaces:

def indent(string: str):
    return sum(4 if char is '\t' else 1 for char in string[:-len(string.lstrip())])

Comments

4

Using next and enumerate:

next((i for i, c in enumerate(a) if c != ' '), len(a))

For any whitespace:

next((i for i, c in enumerate(a) if not c.isspace()), len(a))

Comments

2

That looks... great to me. Usually I answer "Is X Pythonic?" questions with some functional magic, but I don't feel that approach is appropriate for string manipulation.

If there were a built-in to only return the leading spaces, and the take the len() of that, I'd say go for it- but AFAIK there isn't, and re and other solutions are absolutely overkill.

1 Comment

I agree it's overkill len(re.split("\S", a, 1)[0])
2

You can use a regular expression:

def count_leading_space(s): 
    match = re.search(r"^\s*", s) 
    return 0 if not match else match.end()

In [17]: count_leading_space("    asd fjk gl")                                  
Out[17]: 4

In [18]: count_leading_space(" asd fjk gl")                                     
Out[18]: 1

In [19]: count_leading_space("asd fjk gl")                                      
Out[19]: 0

1 Comment

This counts other whitespace chars (like tabs) as well.
0

Yet another way to do it for the sake of completeness. Probably useless as unlikely faster or shorter than other answers.

import re
a = "   foo bar baz qua   \n"
print(len(re.split("\S", a, 1)[0]))

A good property of that syntax is that it literally gives you the prefix.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.