1

Say you have 32000 records with 89 bytes/record stored in a TSV file.

You stick this into a Python dictionary or Ruby hash and you index with a 9 byte key that is itself a field in each record. In other words, you have a dictionary with 32000 key-value pairs, where each key is 9 bytes and each value is 89 bytes. On a modern computer such as a 2.4Ghz Macbook Pro, what's a rough estimation of average time it takes to retrieve a record, and what's the worst-case in theta-notation? Is the implementation in Ruby slower than in Python?

3
  • 9
    how about you benchmark it? Commented Dec 11, 2010 at 1:25
  • 4
    Why don't you just run the test? Frankly, it depends on so many things that you would have to try to know. Commented Dec 11, 2010 at 1:26
  • 4
    Computers laugh at numbers like 32000. Hashing 32000 keys takes 3ms on my 2 years old laptop ... Commented Dec 11, 2010 at 1:47

1 Answer 1

4

A dictionary can typically retrieve keys in constant time so the answer to your question is "very fast".

The only way it would be slow if lots of your keys collided, but you can avoid this by using a good hash function. The default hash function will probably be fine.

Is the implementation in Ruby slower than in Python?

Ruby is typically slower in performance benchmarks than Python by a small factor. I'd expect that probably is true here too.

The Computer Language Benchmarks Game - Ruby vs Python

Sign up to request clarification or add additional context in comments.

3 Comments

At what point does it make sense to stop having this run in-memory?
@mbm: When your memory fills up.
Boy do I love being a noob. Thanks people!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.