0

Not a Python guy--trying to implement this sort faster. Currently I have a hash that includes objects, and I'm sorting them on a call to a method on those objects. I'm not certain how sorted() is functioning--is this making multiple method calls per comparison? Would I be better off perhaps storing the method call on the hash itself and sorting on that?

sorted(hash_object.items(), key=lambda x:x[1].method_call_here())

Currently taking ~100-400ms which is a fairly slow sort. Thoughts?

Responding to what the method call is here; I'm skeptical that it's the method. It's a direct port of my Ruby implementation which runs at 0.2ms, but maybe it's slower in Python for some reason. Really simple method though. It's calling the track quality method below:

class Track:

  def __init__(self, title, play_count, track_number):
    self.title = title
    self.play_count = play_count
    self.track_number = track_number

  def predicted_listens(self):
    return 1/self.track_number

  def track_quality(self):
    return self.play_count/self.predicted_listens()

For reference, it would seem like it's implementing something identical to the Ruby source:

self.sort_by { |track| track.quality }

My guess is I'm wrong about what's happening under the hood.

6
  • Doesn't sound right. No way the Ruby sort should be 500 times faster. Roughly how many items are we talking about? Commented Jul 30, 2013 at 7:34
  • Is this Python2 ro Python3? 1/self.track_number will be truncated if self.track_number is an int in Python2 Commented Jul 30, 2013 at 7:37
  • On this old computer, I get about 148ms to sort 100000 items. Is that in the right ball park? Commented Jul 30, 2013 at 7:44
  • 4
    Are you sure your comparison is fair? It seems like you're comparing wrong things, your ruby code sorts a list of values by a value property, your python code sorts a list of tuples by a property of one tuple. The python code includes an additional convert of dict to list of tuples and an additional tuple lookup in the comparison key. Commented Jul 30, 2013 at 7:55
  • Also, if using Python2, change hash_object.items() to hash_object.iteritems(). Commented Jul 30, 2013 at 8:07

2 Answers 2

2

No it's just calling the method once per item. The deprecated cmp= would get called for each comparison.

You could try profiling it, but most likely the method_call is the cpu hog compared to the sort.

Perhaps you can post the code of method_call_here to see if it can be improved.

Sign up to request clarification or add additional context in comments.

1 Comment

edited with the method call; appreciate the quick feedback :)
0

If track_number is not going to change, it might be worth making predicted_listens a class member and setting its value in the __init__ function (and anywhere else it changes) to avoid calculating it while sorting. This may improve the sort performance.

  def __init__(self, title, play_count, track_number):
    self.title = title
    self.play_count = play_count
    self.track_number = track_number
    self.predicted_listens = 1/self.track_number

1 Comment

For that matter, track_quality need not be a method either, but can also be computed in __init__.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.