1

I have a list of 3D arrays that are all different shapes, but I need them to all be the same shape. Also, that shape needs to be the smallest shape in the list.

For example my_list with three arrays have the shapes (115,115,3), (111,111,3), and (113,113,3) then they all need to be (111,111,3). They are all square color images so they will be of shape (x,x,3).

So I have two main problems:

  • How do I find the smallest shape array without looping or keeping a variable while creating the list?
  • How do I efficiently set all arrays in a list to the smallest shape?

Currently I am keeping a variable for smallest shape while creating my_list so I can do this:

for idx, img in enumerate(my_list):
    img = img[:smallest_shape,:smallest_shape]
    my_list[idx] = img

I just feel like this is not the most efficient way, and I do realize I'm losing values by slicing, but I expect that.

9
  • 1. You can't find smallest value without looping over the whole thing, unless you know they are sorted. However sorting costs more than just finding the smallest one. 2. Set by slicing is already very efficient since internally numpy creates view object instead of copying stuff around. Commented Nov 26, 2016 at 16:35
  • Is resizing the images on loading an option for you ? This would ensure all images are the same shape. Commented Nov 26, 2016 at 16:41
  • Do you want them to be center focused, or do you just want to trim off the extra rows columns to the right and bottom or left and top? Commented Nov 26, 2016 at 16:44
  • @OddNorg When creating the list, I am going through multiple images, finding my desired image via cv2.cascade.detectMultiScale and then adding that to my_list. The detection doesn't always return the same shape, so I can't load them differently. Commented Nov 26, 2016 at 17:00
  • @NaN Adding to the previous comment: I believe trimming them is probably good enough as what I'm detecting shouldn't change much in shape. Commented Nov 26, 2016 at 17:00

1 Answer 1

2

I constructed a sample list with

In [513]: alist=[np.ones((512,512,3)) for _ in range(100)]

and did some timings.

Collecting shapes is fast:

In [515]: timeit [a.shape for a in alist]
10000 loops, best of 3: 31.2 µs per loop

Taking the min takes more time:

In [516]: np.min([a.shape for a in alist],axis=0)
Out[516]: array([512, 512,   3])
In [517]: timeit np.min([a.shape for a in alist],axis=0)
1000 loops, best of 3: 344 µs per loop

slicing is faster

In [518]: timeit [a[:500,:500,:] for a in alist]
10000 loops, best of 3: 133 µs per loop

now try to isolate the min step.

In [519]: shapes=[a.shape for a in alist]
In [520]: timeit np.min(shapes, axis=0)
The slowest run took 5.75 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 136 µs per loop

When you have lists of objects, iteration is the only way to deal with all elements. Look at the code for np.hstack and np.vstack (and others). They do one or more list comprehensions to massage all the input arrays into the correct shape. Then they do np.concatenate which iterates too, but in compiled code.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, I didn't think to look into source code like that. I'd upvote your response but I don't have enough reputation yet as I've only ever viewed SO for the past decade.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.