What is the pythonic way to count the leading spaces in a string?

Question

I know I can count the leading spaces in a string with this:

>>> a = "   foo bar baz qua   \n"
>>> print "Leading spaces", len(a) - len(a.lstrip())
Leading spaces 3
>>>

But is there a more pythonic way?

Unpleasant -- but different -- way: a.count(" ", 0, a.index(a.split(None, 1)[0])) — Katriel
– Katriel, Commented Nov 30, 2012 at 16:17
Bear in mind that lstrip will remove tabs and other whitespace characters as well as spaces. — Steve Mayne
– Steve Mayne, Commented Nov 30, 2012 at 16:50

Nico Schlömer · Accepted Answer · 2023-02-26 19:47:22Z

121

Your way is pythonic but incorrect, it will also count other whitespace chars, to count only spaces be explicit a.lstrip(' '). Compare

a = "   \r\t\n\tfoo bar baz qua   \n"
print("Leading spaces", len(a) - len(a.lstrip()))

>>> Leading spaces 7

and

print("Leading spaces", len(a) - len(a.lstrip(' '))

>>> Leading spaces 3

edited Feb 26, 2023 at 19:47

Nico Schlömer

59.6k35 gold badges216 silver badges291 bronze badges

answered Nov 30, 2012 at 16:21

zenpoy

20.3k10 gold badges65 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

yori · Accepted Answer · 2014-10-08 20:43:36Z

26

You could use itertools.takewhile

sum( 1 for _ in itertools.takewhile(str.isspace,a) )

And demonstrating that it gives the same result as your code:

>>> import itertools
>>> a = "    leading spaces"
>>> print sum( 1 for _ in itertools.takewhile(str.isspace,a) )
4
>>> print "Leading spaces", len(a) - len(a.lstrip())
Leading spaces 4

I'm not sure whether this code is actually better than your original solution. It has the advantage that it doesn't create more temporary strings, but that's pretty minor (unless the strings are really big). I don't find either version to be immediately clear about that line of code does, so I would definitely wrap it in a nicely named function if you plan on using it more than once (with appropriate comments in either case).

edited Oct 8, 2014 at 20:43

yori

1134 bronze badges

answered Nov 30, 2012 at 16:16

mgilson

312k70 gold badges656 silver badges722 bronze badges

7 Comments

Silas Ray Over a year ago

I was trying to figure out exactly this, only without itertools. I really need to learn itertools...

ChaimG Over a year ago

On my system, (Python 2.7.10 32 bit running on Windows), lstrip() is 3.5x as fast as itertools.

mgilson Over a year ago

@ChaimG -- I bet that we could construct some strings for which that isn't the case (e.g. if the string is really long and only has one or two leading spaces). For many common cases however, I agree that lstrip will be much faster.

ChaimG Over a year ago

@mgilson -- Correct. With the string: a = ' ' + 'a'*100000000, itertools is 67k times faster. I wonder why? Is it because lstrip() creates a copy of the string?

mgilson Over a year ago

@ChaimG -- that's exactly why :-). At one point, I assumed that lstrip() wouldn't create a new string -- Immutability should make that possible. However, I made that statement on a google mailing list once and was corrected by Alex Martelli IIRC :-). I'm not sure why they don't re-use the old string, but it might be because in a lot of cases that would prevent a large string from getting deallocated.

|

Junuxx · Accepted Answer · 2012-11-30 17:08:43Z

18

Just for variety, you could theoretically use regex. It's a little shorter, and looks nicer than the double call to len().

>>> import re
>>> a = "   foo bar baz qua   \n"
>>> re.search('\S', a).start() # index of the first non-whitespace char
3

Or alternatively:

>>> re.search('[^ ]', a).start() # index of the first non-space char
3

But I don't recommend this; according to a quick test I did, it's much less efficient than len(a)-len(lstrip(a)).

edited Nov 30, 2012 at 17:08

answered Nov 30, 2012 at 16:26

Junuxx

14.3k5 gold badges43 silver badges74 bronze badges

Comments

Stephen Rauch · Accepted Answer · 2020-12-14 21:08:56Z

6

I recently had a similar task of counting indents, because of which I wanted to count tab as four spaces:

def indent(string: str):
    return sum(4 if char is '\t' else 1 for char in string[:-len(string.lstrip())])

edited Dec 14, 2020 at 21:08

Stephen Rauch♦

50.1k32 gold badges118 silver badges143 bronze badges

answered Oct 12, 2019 at 23:01

jedi5218

611 silver badge1 bronze badge

Comments

ecatmur · Accepted Answer · 2012-11-30 16:26:22Z

4

Using next and enumerate:

next((i for i, c in enumerate(a) if c != ' '), len(a))

For any whitespace:

next((i for i, c in enumerate(a) if not c.isspace()), len(a))

answered Nov 30, 2012 at 16:26

ecatmur

158k28 gold badges311 silver badges387 bronze badges

Comments

Matt Luongo · Accepted Answer · 2012-11-30 16:13:29Z

2

That looks... great to me. Usually I answer "Is X Pythonic?" questions with some functional magic, but I don't feel that approach is appropriate for string manipulation.

If there were a built-in to only return the leading spaces, and the take the len() of that, I'd say go for it- but AFAIK there isn't, and re and other solutions are absolutely overkill.

answered Nov 30, 2012 at 16:13

Matt Luongo

14.9k6 gold badges55 silver badges64 bronze badges

1 Comment

kriss Over a year ago

I agree it's overkill len(re.split("\S", a, 1)[0])

user12421304 · Accepted Answer · 2021-05-19 08:48:42Z

2

You can use a regular expression:

def count_leading_space(s): 
    match = re.search(r"^\s*", s) 
    return 0 if not match else match.end()

In [17]: count_leading_space("    asd fjk gl")                                  
Out[17]: 4

In [18]: count_leading_space(" asd fjk gl")                                     
Out[18]: 1

In [19]: count_leading_space("asd fjk gl")                                      
Out[19]: 0

answered May 19, 2021 at 8:48

user12421304

1 Comment

Claas Bontus Over a year ago

This counts other whitespace chars (like tabs) as well.

kriss · Accepted Answer · 2023-06-22 13:49:42Z

0

Yet another way to do it for the sake of completeness. Probably useless as unlikely faster or shorter than other answers.

import re
a = "   foo bar baz qua   \n"
print(len(re.split("\S", a, 1)[0]))

A good property of that syntax is that it literally gives you the prefix.

answered Jun 22, 2023 at 13:49

kriss

24.3k17 gold badges104 silver badges120 bronze badges

Collectives™ on Stack Overflow

What is the pythonic way to count the leading spaces in a string?

8 Answers 8

Comments

7 Comments

Comments

Comments

Comments

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

Comments

7 Comments

Comments

Comments

Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related