11

How many substrings can you make out of a string like abcd?

How can I get all of its substrings:

['a', 'b', 'c', 'd', 'ab', 'bc', 'cd', 'abc', 'bcd', 'abcd']
2
  • 3
    You are asking two different questions: "How many combinations can you make…?" and, "How can I get it to look like…". The answers to those two questions are very different. Commented Oct 17, 2012 at 23:39
  • @MarceloCantos I don't see how that's true. One is just the length of the other. You can make sum(1...n), i.e. n*n(-1)/2 substrings of a string of length n. Commented Dec 17, 2016 at 22:09

5 Answers 5

12

Try this:

def consecutive_groups(iterable):
    s = tuple(iterable)
    for size in range(1, len(s)+1):
        for index in range(len(s)+1-size):
            yield iterable[index:index+size]

>>> print list(consecutive_groups('abcd'))
['a', 'b', 'c', 'd', 'ab', 'bc', 'cd', 'abc', 'bcd', 'abcd']

And the number of combinations is simply equal to the sum from 1 to the length of the string, which is equivalent to n * (n + 1) / 2.

By the way, if you want to avoid duplicates, you can simply use a locally-defined set in the generator function, like so:

def consecutive_groups(iterable):
    s = tuple(iterable)
    seen = set()
    for size in range(1, len(s)+1):
        for index in range(len(s)+1-size):
            slc = iterable[index:index+size]
            if slc not in seen:
                seen.add(slc)
                yield slc

That code is a little more unwieldy and could probably be optimized for indentation, but it will do for a proof of concept.

Sign up to request clarification or add additional context in comments.

6 Comments

I'm not your downvoter, but it's not quite right. Try map(partial(reduce, operator.add), powerset('abcd')).
generators make hulk happy! +1 O.o
I just noticed this is a fine approach as long as the OP doesn't care for duplicated substrings. (e.g. what happens with 'abcdab')
either n or n + 1 are even, so a division by 2 will always yield an integer result. What do you mean with the comment about the true division?
@hochl Nothing, brain lapse. (I probably was mistakenly thinking that the division would be evaluated first, before the multiplication.) Editing.
|
11

Would this do?

import itertools
def substrings(x):
    for i, j in itertools.combinations(xrange(len(x)+1), 2):
        yield x[i:j]

or as generator expression:

(x[i:j] for i, j in itertools.combinations(xrange(len(x)+1), 2))

The expanded result for your example looks like this:

['a', 'ab', 'abc', 'abcd', 'b', 'bc', 'bcd', 'c', 'cd', 'd']

To order by length, use sort key=len.

Comments

2

This is what you want:

In [260]: S = 'abcd'

In [261]: list(itertools.chain.from_iterable([list(itertools.combinations(S,i)) for i in range(1,len(S))]))
Out[261]: 
[('a',),
 ('b',),
 ('c',),
 ('d',),
 ('a', 'b'),
 ('a', 'c'),
 ('a', 'd'),
 ('b', 'c'),
 ('b', 'd'),
 ('c', 'd'),
 ('a', 'b', 'c'),
 ('a', 'b', 'd'),
 ('a', 'c', 'd'),
 ('b', 'c', 'd')]

Or if you really want them all as strings, you could do:

In [262]: combos  = list(itertools.chain.from_iterable([list(itertools.combinations(S,i)) for i in range(1,len(S))]))

In [263]: [''.join(c) for c in combos]
Out[263]: 
['a',
 'b',
 'c',
 'd',
 'ab',
 'ac',
 'ad',
 'bc',
 'bd',
 'cd',
 'abc',
 'abd',
 'acd',
 'bcd']

EDIT To get only substrings of S:

In [270]: list(itertools.chain.from_iterable([[S[i:i+k] for i in range(len(S)-k)] for k in range(1,len(S)+1)])) + [S]
Out[270]: ['a', 'b', 'c', 'ab', 'bc', 'abc', 'abcd']

4 Comments

I think you want itertools.combinations(S, i+1), to get the unchanged string too. Also the list and square brackets around your inner generator are unnecessary. chain.from_iterable is perfectly happy taking generators from a generator.
is the empty string part of a solution? If yes you are missing it too. I think ac is not a valid solution either since b is missing in between.
for the second one (EDIT To get only substrings of S:) how would I find only substrings of length x?
@Mathime: You should be able to filter them after the fact: answer = [s for s in all_substrings if len(s) == x]
2

There are two questions there.

The first, How many substrings can you make out of a string like “abcd”? is a combinations like this:

import itertools
s='abcd'
com=[list(itertools.combinations(s,x)) for x in range(1,len(s)+1)]

print [''.join(e) for e in sum(com,[])]

prints:

['a', 'b', 'c', 'd', 'ab', 'ac', 'ad', 'bc', 'bd', 'cd', 'abc', 'abd', 'acd', 'bcd', 'abcd']

The second question is how to replicate your example (which is not a 'combination'). You can do that with this code:

>>> [s[i:i+j] for j in range(1,len(s)+1) for i in range(len(s)-j+1)]
['a', 'b', 'c', 'd', 'ab', 'bc', 'cd', 'abc', 'bcd', 'abcd']

6 Comments

I did not downvote but most probably because ac is not part of the solution.
actually ac still is not a substring because a and c are not neighbours to each other.
@hochl: Please look at the edit. It produces his example exactly. The problem is he is asking two questions.
if ac is a substring then quite possibly ca is a substring too since a and c are part of the string. I think he means a substring consists of consecutive ranges of characters.
@hochl: Please look at the second list. It addresses your comments. ac is not part of ['a', 'b', 'c', 'd', 'ab', 'bc', 'cd', 'abc', 'bcd', 'abcd'] It replicates the example exactly.
|
2

I think this works too and while is not the most efficient, it has the attractive of using less complex features.

S = "abcd"
substrings = [S[i:j] for i in range(len(S)) for j in range(i+1,len(S)+1)]
substrings.sort(key=len)

Note however that this approach does not remove identical substrings that might appear. For example if the original substring was "abcdab", a, b and ab would appear twice.

1 Comment

In order to not include duplicate substrings, change the list comprehension [...] to a set comprehension {...}.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.