1

Given an adjacency list:

adj_list = [array([0,1]),array([0,1,2]),array([0,2])]

And an array of indices,

ind_arr = array([0,1,2])

Goal:

A = np.zeros((3,3))
for i in ind_arr:
    A[i,list(adj_list[x])] = 1.0/float(adj_list[x].shape[0])


Currently, I have written:

A[ind_list[:],adj_list[:]] = 1. / len(adj_list[:]) 

And tried various configurations of indexing within this scaffold.

2
  • Would the array elements of adj_list always be in that sequence format - [0,1,2...]? Commented Aug 4, 2017 at 18:32
  • @Divakar they are in ascending order. But, they are sparse Commented Aug 4, 2017 at 18:34

2 Answers 2

1

Here's one approach -

lens = np.array([len(i) for i in adj_list])
col_idx = np.concatenate(adj_list)
out = np.zeros((len(lens), col_idx.max()+1)) 
row_idx = np.repeat(np.arange(len(lens)), lens)
vals = np.repeat(1.0/lens, lens)
out[row_idx, col_idx] = vals

Sample input, output -

In [494]: adj_list = [np.array([0,2]),np.array([0,1,4])]

In [496]: out
Out[496]: 
array([[ 0.5       ,  0.        ,  0.5       ,  0.        ,  0.        ],
       [ 0.33333333,  0.33333333,  0.        ,  0.        ,  0.33333333]])

Sparse matrix as output

Additionally, if you want to save memory and create a sparse matrix instead, that's an easy extension -

In [506]: from scipy.sparse import csr_matrix

In [507]: csr_matrix((vals, (row_idx, col_idx)), shape=(len(lens), col_idx.max()+1))
Out[507]: 
<2x5 sparse matrix of type '<type 'numpy.float64'>'
    with 5 stored elements in Compressed Sparse Row format>

In [508]: _.toarray()
Out[508]: 
array([[ 0.5       ,  0.        ,  0.5       ,  0.        ,  0.        ],
       [ 0.33333333,  0.33333333,  0.        ,  0.        ,  0.33333333]])
Sign up to request clarification or add additional context in comments.

Comments

1

I don't think you can completely eliminate loops due to the mixed data types, but you can reduce the nested double for loops to a single one:

A = np.zeros((2, 3))
for i, arr in enumerate(adj_list):
    arr_size = len(arr)
    A[i, :arr_size] = 1./arr_size

A
# array([[ 0.5       ,  0.5       ,  0.        ],
#        [ 0.33333333,  0.33333333,  0.33333333]])

Or if the numbers in the arrays are actually columns positions:

A = np.zeros((2, 3))
for i, arr in enumerate(adj_list):
    A[i, arr] = 1./len(arr)

A
# array([[ 0.5       ,  0.5       ,  0.        ],
#        [ 0.33333333,  0.33333333,  0.33333333]])

Another option using MultiLabelBinarizer from sklearn(but may not be as efficient):

from sklearn.preprocessing import MultiLabelBinarizer
​
mlb = MultiLabelBinarizer()

adj_list = [np.array([0,1]),np.array([0,1,2])]
​
sizes = np.fromiter(map(len, adj_list), dtype=int)
mlb.fit_transform(adj_list)/sizes[:,None]

# array([[ 0.5       ,  0.5       ,  0.        ],
#        [ 0.33333333,  0.33333333,  0.33333333]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.