18

I am looking for an efficient way to sort a list of strings according a custom alphabet.

For example, I have a string alphabet which is "bafmxpzv" and a list of strings composed from only the characters contained in that alphabet.

I would like a way to sort that list similarly to other common sorts, but using this custom alphabet. How can I do that?

3
  • 1
    possible duplicate of Custom sort python Commented Oct 27, 2014 at 0:12
  • wow, in the linked question, I found this piece of code I do not understand well but it seems to work: new_list = sorted(inputList, key=lambda word: [alphabet.index(c) for c in word[0]]) ------ why word[0]???? cant understand that Commented Oct 27, 2014 at 0:45
  • clearly, just "word" should be there Commented Oct 27, 2014 at 11:48

3 Answers 3

34

Let's create an alphabet and a list of words:

In [32]: alphabet = "bafmxpzv"

In [33]: a = ['af', 'ax', 'am', 'ab', 'zvpmf']

Now let's sort them according to where the letters appear in alphabet:

In [34]: sorted(a, key=lambda word: [alphabet.index(c) for c in word])
Out[34]: ['ab', 'af', 'am', 'ax', 'zvpmf']

The above sorts in the correct order.

sorted enables a wide range of custom sorting. The sorted function has three optional arguments: cmp, key, and reverse:

  • cmp is good for complex sorting tasks. If specified, cmp should be a functionIt that takes two arguments. It should return a negative, zero or positive number depending on whether the first argument is considered smaller than, equal to, or larger than the second argument. For this case, cmp is overkill.

  • key, if spedified, should be a function that takes one argument and returns something that python knows natively how to sort. In this case, key returns a list of the indices of each of the word's characters in the alphabet.

    In this case, key returns the index of a letter in alphabet.

  • reverse, if true, reverses the sort-order.

A nonworking alternative

From the comments, this alternative form was mentioned:

In [35]: sorted(a, key=lambda word: [alphabet.index(c) for c in word[0]])
Out[35]: ['af', 'ax', 'am', 'ab', 'zvpmf']

Note that this does not sort in the correct order. That is because the key function here only considers the first letter of each word. This can be demonstrated by testing key:

In [2]: key=lambda word: [alphabet.index(c) for c in word[0]]

In [3]: key('af')
Out[3]: [1]

In [4]: key('ax')
Out[4]: [1]

Observe that key returns the same value for two different strings, af and ax. The value returned reflects only the first character of each word. Because of this, sorted has no way of determining that af belongs before ax.

Sign up to request clarification or add additional context in comments.

4 Comments

Great solution! Minor correction: "In this case, key returns the index of a letter in alphabet." should be: "In this case, key returns a list with an entry for each character in word reflecting the character's index in alphabet."
@Jpsy Thanks and I reworded that sentence!
How could this be applied to sort based on a sub-string within the string? For example if a = ['01af', '02ax', '03am', '04ab', '05zvpmf'], so the list is only sorted by alphabetic characters only
@IMLD It isn't clear to me exactly what you want but try: sorted(a, key=lambda word: [alphabet.index(c) for c in word if c.isalpha()])
2

Update, I misread your question, you have a list of strings, not a single string, here's how to do it, the idea is the same, use a sort based on a custom comparison function:

def acmp (a,b):
 la = len(a)
 lb = len(b)
 lm = min(la,lb)
 p = 0
 while p < lm:
    pa = alphabet.index(a[p])
    pb = alphabet.index(b[p])
    if pa > pb:
        return 1
    if pb > pa:
        return -1
    p = p + 1

 if la > lb:
    return 1
 if lb > la:
    return -1
 return 0

mylist = ['baf', 'bam', 'pxm']
mylist.sort(cmp = acmp)

3 Comments

how can you call sort on a string?
oops it doesn't have sort, use sorted then, thanks for correction. I'ved edited the answer
on second thought, I think I miss-understood his original questions. It is a list of strings, not just a string. I'll post another answer.
2

Instead of using index() which requires finding the index of a char, a better alternative consists in building a hash map to be used in the sorting, in order to retrieve the index directly.
Example:

>>> alphabet = "bafmxpzv"
>>> a = ['af', 'ax', 'am', 'ab', 'zvpmf']
>>> order = dict(zip(alphabet, range(len(alphabet))))
>>> sorted(a, key=lambda word: [order[c] for c in word])
['ab', 'af', 'am', 'ax', 'zvpmf']

1 Comment

I would replace line 3 with order = { ch : i for i, ch in enumerate(alphabet) }; though I suppose that's up to personal preference.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.