Delete certain elements of a numpy array

Question

I have two numpy arrays a and b. I have a definition that construct an array c whose elements are all the possible sums of different elements of a.

import numpy as np

def Sumarray(a):
    n = len(a)

    sumarray = np.array([0]) # Add a default zero element
    for k in range(2,n+1):
        full = np.mgrid[k*(slice(n),)]
        nd_triu_idx = full[:,(np.diff(full,axis=0)>0).all(axis=0)]
        sumarray = np.append(sumarray, a[nd_triu_idx].sum(axis=0))

    return sumarray

a = np.array([1,2,6,8])
c = Sumarray(a)
print(d)

I then perform a subsetsum between an element of c and b: isSubsetSum returns the elements of b that when summed gives c[1]. Let's say that I get

c[0] = b[2] + b[3]

Then I want to remove:

the elements b[2], b[3] (easy bit), and
the elements of a that when summed gave c[0]

As you can see from the definition, Sumarray, the order of sums of different elements of a are preserved, so I need to realise some mapping.

The function isSubsetSum is given by

def _isSubsetSum(numbers, n, x, indices):
    if (x == 0):
        return True
    if (n == 0 and x != 0):
        return False
    # If last element is greater than x, then ignore it
    if (numbers[n - 1] > x):
        return _isSubsetSum(numbers, n - 1, x, indices)
    # else, check if x can be obtained by any of the following
    found = _isSubsetSum(numbers, n - 1, x, indices)
    if found: return True
    indices.insert(0, n - 1)
    found = _isSubsetSum(numbers, n - 1, x - numbers[n - 1], indices)
    if not found: indices.pop(0)
    return found

def isSubsetSum(numbers, x):
    indices = []
    found = _isSubsetSum(numbers, len(numbers), x, indices)
    return indices if found else None

This would actually be much easier using the order from my last comment here stackoverflow.com/q/58404635/7207392 then np.unpackbits(<the_index>.view('u1'),count=len(a),bitorder='little') would give you a mask that is 1 at the positions in a that went into the sum. — Paul Panzer
– Paul Panzer, Commented Oct 16, 2019 at 21:53
@PaulPanzer, could you explain that a little more. I don't still get the mapping — Sid
– Sid, Commented Oct 16, 2019 at 21:55
@PaulPanzer, Could you write it as a solution, since i've been trying to piece together your suggestions from your other post and getting this. — Sid
– Sid, Commented Oct 16, 2019 at 22:12

Paul Panzer · Accepted Answer · 2019-10-17 23:26:48Z

1

As you are iterating over all possible numbers of terms, you could as well directly generate all possible subsets.

These can be conveniently encoded as numbers 0,1,2,... by means of their binary representations: O means no terms at all, 1 means only the first term, 2 means only the second, 3 means the first and the second and so on.

Using this scheme it becomes very easy to recover the terms from the sum index because all we need to do is obtain the binary representation:

UPDATE: we can suppress 1-term-sums with a small amount of extra code:

import numpy as np

def find_all_subsums(a,drop_singletons=False):
    n = len(a)
    assert n<=32 # this gives 4G subsets, and we have to cut somewhere
    # compute the smallest integer type with enough bits
    dt = f"<u{1<<((n-1)>>3).bit_length()}"
    # the numbers 0 to 2^n encode all possible subsets of an n
    # element set by means of their binary representation
    # each bit corresponds to one element number k represents the 
    # subset consisting of all elements whose bit is set in k
    rng = np.arange(1<<n,dtype=dt)
    if drop_singletons:
        # one element subsets correspond to powers of two
        rng = np.delete(rng,1<<np.arange(n))
    # np.unpackbits transforms bytes to their binary representation
    # given the a bitvector b we can compute the corresponding subsum
    # as b dot a, to do it in bulk we can mutliply the matrix of 
    # binary rows with a
    return np.unpackbits(rng[...,None].view('u1'),
                         axis=1,count=n,bitorder='little') @ a

def show_terms(a,idx,drop_singletons=False):
    n = len(a)
    if drop_singletons:
        # we must undo the dropping of powers of two to get an index
        # that is easy to translate. One can check that the following
        # formula does the trick
        idx += (idx+idx.bit_length()).bit_length()
        # now we can simply use the binary representation
    return a[np.unpackbits(np.asarray(idx,dtype='<u8')[None].view('u1'),
                           count=n,bitorder='little').view('?')]


example = np.logspace(1,7,7,base=3)
ss = find_all_subsums(example,True)
# check every single sum
for i,s in enumerate(ss):
    assert show_terms(example,i,True).sum() == s
# print one example
idx = 77
print(ss[idx],"="," + ".join(show_terms(example.astype('U'),idx,True)))

Sample run:

2457.0 = 27.0 + 243.0 + 2187.0

edited Oct 17, 2019 at 23:26

answered Oct 16, 2019 at 22:54

Paul Panzer

53.3k3 gold badges59 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Paul Panzer Over a year ago

@SId Ah, sorry, I missed that you are starting from 2, not 1. But you do include 0. If it's a big problem it can be fixed but it makes the code more complicated. idx is just a random example index. I do return the elements as an array, the printing happens outside the function.

Paul Panzer Over a year ago

@Sid, pls check out update. That's all my programming for today.

Sid Over a year ago

why doesn't something like show_terms(a,c[1],True) work? I get the AttributeError: 'numpy.int64' object has no attribute 'bit_length'

Paul Panzer Over a year ago

@SId Could you try c.item(1) instead of c[1]?

Paul Panzer Over a year ago

@Sid there is a subtle difference between Python ints and numpy ints. Sometimes like here it breaks things.

|

Collectives™ on Stack Overflow

Delete certain elements of a numpy array

1 Answer 1

9 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related