`set_includes` (string subsequence-containment) in Python 2

Question

I've implemented C++'s std::includes algorithm in Python, so that I can use it to efficiently implement a Scrabble "can I make this word" function:

def word_can_be_made_from_rack(word, rack):
    return set_includes(sorted(rack), sorted(word))

Here's the implementation, with some test cases:

def set_includes(haystack, needle):
    j = 0
    hn = len(haystack)
    for c in needle:
        while j != hn and haystack[j] < c:
            j += 1
        if j == hn:
            return False
        if haystack[j] > c:
            return False
        j += 1
    return True

assert set_includes('abcdef', 'af')
assert set_includes('abcdef', 'bce')
assert set_includes('abcdef', 'abcdef')
assert set_includes('aaaaa', 'a')
assert set_includes('aaaaa', 'aa')
assert set_includes('aaaaax', 'ax')
assert set_includes('abbbcxx', 'abc')

This is similar to Find if one list is a subsequence of another except that it assumes (and requires) that the two input strings are sorted.

The manual management of index j in this code doesn't feel very Pythonic. Am I missing an easier way to write this algorithm?

itertools one-liners will be accepted as answers, especially if they're more performant. :)

vnp · Accepted Answer · 2019-07-18 15:17:51Z

1

AFNP. The loop condition j != hn is more idiomatically expressed as an exception:

try:
    for c in needle:
        while haystack[j] < c:
            ....
except IndexError:
    return False

No naked loops. Factor the while haystack[j] < c: into a function. Among other benefits, it'd allow
```
    j = search_character(haystack[j:], c)
```
The binary search for c seems more performant than linear. See bisect module.

answered Jul 18, 2019 at 15:17

vnp

58.7k4 gold badges55 silver badges144 bronze badges

\$\begingroup\$ AFNP? Do you mean EAFP? \$\endgroup\$

301_Moved_Permanently
– 301_Moved_Permanently

2019-07-18 15:57:18 +00:00
Commented Jul 18, 2019 at 15:57
\$\begingroup\$ @MathiasEttinger Same (Ask Forgiveness, Not Permission). \$\endgroup\$

vnp
– vnp

2019-07-18 16:12:46 +00:00
Commented Jul 18, 2019 at 16:12

Add a comment |

AlexV · Accepted Answer · 2019-07-18 19:09:21Z

I had an idea to just try to make the word by removing all the needed characters from the rack/haystack and see if it works. The idea also follows the "Easier to ask for forgiveness than permission" approach.

def set_includes(haystack, needle):
    haystack = list(haystack)
    try:
        for char in needle:
            haystack.remove(char)
        return True
    except ValueError:
        return False

Obviously, this will scale badly for cases where haystack == needle for larger string lengths (noticeable starting at about n >= 500), but it does not need sorting. So you will have to check whether or not this is more efficient for your use case.

Depending on how often the check would return false because needle does contain one or more letters that are not on the rack, sets can maybe help you to take a shortcut:

if set(haystack).issuperset(needle):
    # check if there are enough letters to build needle from haystack
    ...
else:
    # haystack does not contain all the letters needed to build needle
    return False

Just for fun: We here in Python have iterators, too :-D

def set_includes(haystack, needle):
    it1 = iter(haystack)
    it2 = iter(needle)
    char2 = next(it2)
    while True:
        try:
            char1 = next(it1)
        except StopIteration:
            return False
        if char2 < char1:
            return False

        if not char1 < char2:
            try:
                char2 = next(it2)
            except StopIteration:
                return True

If you move the last try: ... catch ...: a few levels outwards, you can get quite close to the structure of the possible implementation given on cppreference. Don't take this to serious though.

We can do a little bit better:

def set_includes(haystack, needle):
    it2 = iter(needle)
    char2 = next(it2)
    for char1 in haystack:
        if char2 < char1:
            return False

        if not char1 < char2:
            try:
                char2 = next(it2)
            except StopIteration:
                return True

    return False

Here, at least one of the try: ... catch ...:s is transformed into a proper loop.

RootTwo · Accepted Answer · 2019-07-19 01:55:13Z

Use str.index()

The lines:

    while j != hn and haystack[j] < c:
        j += 1

are basically trying to find the index of c in haystack. So just use str.index():

def set_includes(haystack, needle):
    try:
        i = -1
        for c in needle:
            i = haystack.index(c, i+1)

    except ValueError:
        return False

    return True

Or, if you prefer use str.find():

def set_includes(haystack, needle):
    i = -1
    for c in needle:
        i = haystack.find(c, i+1)
        if i<0:
            return False
    return True

itertools one-liner

Almost forgot, here's your one-liner:

from itertools import groupby

set_includes = lambda h, n:not any(h.find(''.join(g))<0 for _,g in groupby(n))

Stack Exchange Network

`set_includes` (string subsequence-containment) in Python 2

3 Answers 3

Use str.index()

itertools one-liner

You must log in to answer this question.

Linked

Hot Network Questions

`set_includes` (string subsequence-containment) in Python 2

3 Answers 3

Use str.index()

itertools one-liner

You must log in to answer this question.

Linked

Related

Hot Network Questions