5

I've looked at several other SO questions (and google'd tons) that are 'similar'-ish to this, but none of them seem to fit my question right.

I am trying to make a non fixed length, unique text string, only containing characters in a string I specify. E.g. made up of capital and lower case a-zA-Z characters. (for this example I use only a, b, and c lower case)

Something like this (broken code below)

def next(index, validCharacters = 'abc'):
    return uniqueShortAsPossibleString

The index argument would be an index (integer) that relate to a text string, for instance:

next(1)  == 'a'
next(2)  == 'b'
next(3)  == 'c'

next(4)  == 'aa'
next(5)  == 'ab'
next(6)  == 'ac'

next(7)  == 'ba'
next(8)  == 'bb'
next(9)  == 'bc'

next(10) == 'ca'
next(11) == 'cb'
next(12) == 'cc'

And so forth. The string:

  1. Must be unique, I'll be using it as an identifier, and it can only be a-zA-Z chars
  2. As short as possible, with lower index numbers being shortest (see above examples)
  3. Contain only the characters specified in the given argument string validCharacters

In conclusion, how could I write the next() function to relate an integer index value to an unique short string with the characters specified?

P.S. I'm new to SO, this site has helped me tons throughout the years, and while I've never made an account or asked a question (till now), I really hope I've done an okay job explaining what I'm trying to accomplish with this.

1
  • Beware the iterative answers. While they may work, you have to store the state if you want to return where you left off without recomputing all previous values. Commented Oct 25, 2012 at 6:51

6 Answers 6

3

What you are trying to do is write the parameter of the next function in another base.

Let's suppose validCharacters contains k characters: then the job of the next function will be to transform parameter p into base k by using the characters in validCharacters.

In your example, you can write the numbers in base 3 and then associate each digit with one letter:

next(1) -> 1 -> 'a'
next(2) -> 2 -> 'b'

next(4) -> 11 -> 'aa'
next(7) -> 21 -> 'ba'

And so forth.

With this method, you can call next(x) without knowing or computing any next(x-i), which you can't do with iterative methods.

Sign up to request clarification or add additional context in comments.

1 Comment

+1 for idea, I needed to see some sort of base implementation or psuedo code to understand it though.
1

You're trying to convert a number to a number in another base, but using arbitrary characters for the digits of that base.

import string
chars = string.lowercase + string.uppercase

def identifier(x, chars):
    output = []
    base = len(chars)
    while x:
        output.append(chars[x % base])
        x /= base
    return ''.join(reversed(output))

print identifier(1, chars)

This lets you jump to any position, you're counting so the identifiers are totally unique, and it is easy to use any character set of any length (of two or more), and lower numbers give shorter identifiers.

10 Comments

The reversed isn't even necessary here as you're worried only about length, not order.
Beware though: identifier(123456789, chars) returns þƒžå
Accepted (and +1'd?) this is exactly what I needed, and I appreciate the implementation. I knew it was something like this, just was unable to put a finger on it!
@JonClements What version of Python? Works fine for me on 2.7
agf: you're aware that your function never returns a string more than one character long starting with 'a', right? You can do a bit better, but you have to use a non-standard base system.
|
1

itertools can always give you obfuscated one-liner iterators:

from itertools import combinations_with_replacement, chain

chars = 'abc'
a = chain(*(combinations_with_replacement(chars, i) for i in range(1, len(chars) + 1)))

Basically, this code creates an iterator that combines all combinations of chars of lengths 1, 2, ..., len(chars).

The output of for x in a: print x is:

('a',)
('b',)
('c',)
('a', 'b')
('a', 'c')
('b', 'a')
('b', 'c')
('c', 'a')
('c', 'b')
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')

1 Comment

+1 for making me feel like a noob. I solved this problem the long way round a few months ago, it was a friggin mission :)
1

You can't really "associate" the index with annoying, but the following is a generator that will yield and provide the output you're asking for:

from itertools import combinations_with_replacement

def uniquenames(chars):
    for i in range(1, len(chars)):
        for j in combinations_with_replacement(chars, i):
            yield ''.join(j)

print list(uniquenames('abc'))
# ['a', 'b', 'c', 'aa', 'ab', 'ac', 'bb', 'bc', 'cc']

1 Comment

@Blender Thanks, just realised it's missing some though... need to fix that
1

As far as I understood we shouldn't specify maximum length of output string. So range is not enough:

>>> from itertools import combinations_with_replacement, count
>>> def u(chars):
...     for i in count(1):
...         for k in combinations_with_replacement(chars, i):
...             yield "".join(k)
... 
>>> g = u("abc")
>>> next(g)
'a'
>>> next(g)
'b'
>>> next(g)
'c'
>>> next(g)
'aa'
>>> next(g)
'ab'
>>> next(g)
'ac'
>>> next(g)
'bb'
>>> next(g)
'bc'

1 Comment

@Blender I don't think in this case the iterative answers are solving the right problem.
0

So it seems like you are trying to enumerate through all the strings generated by the language {'a','b','c'}. This can be done using finite state automata (though you don't want to do that). One simple way to enumerate through the language is to start with a list and append all the strings of length 1 in order (so a then b then c). Then append each letter in the alphabet to each string of length n-1. This will keep it in order as long as you append all the letters in the alphabet to a given string before moving on to the lexicographically next string.

1 Comment

This problem is much, much simpler than that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.