3

I try to convert a list of astropy Table in a numpy array of astropy Table. In first instance I tried np.asarray(list) and np.array(list) but the astropy table inside the list were converted with the list as numpy ndarray.

Example :

t = Table({'a': [1,2,3], 'b':[4,5,6]})  
t2 = Table({'a': [7,8,9], 'b':[10,11,12]})
mylist = [t1, t2]
print(mylist)

The output is:

[<Table length=3>
  a     b
int64 int64
----- -----
    1     4
    2     5
    3     6, 
<Table length=3>
  a     b
int64 int64
----- -----
    7    10
    8    11
    9    12]

Then if I apply np.array() the output is :

array([[(1,  4), (2,  5), (3,  6)],
       [(7, 10), (8, 11), (9, 12)]], dtype=[('a', '<i8'), ('b', '<i8')])

but I want the following:

array([<Table length=3>
  a     b
int64 int64
----- -----
    1     4
    2     5
    3     6, 
<Table length=3>
  a     b
int64 int64
----- -----
    7    10
    8    11
    9    12])

My actual solution is :

if isinstance(mylist, list):
    myarray = np.empty(len(mylist), dtype='object')
    for i in range(len(myarray)):
        myarray[i] = mylist[i]
else:
    myarray = mylist
return myarray

It works but I was thinking that there is maybe something built-in in numpy to do this, but I can't find it.

6
  • Will you please share the output of the code and your desired output? Commented Oct 2, 2021 at 7:30
  • Why do you want to change a list of Tables into an array of Tables? Generally, that will not be where any speed-up is, and a list is a perfectly fine data structure to hold your Tables. Commented Oct 2, 2021 at 7:43
  • I've updated the question with an example. I want to do this change not for speed up but to apply a boolean selection to select some of these table. But since the boolean indexing of list doesn't work, the easiest solution I know is using numpy array. Commented Oct 2, 2021 at 7:54
  • Unless your list is very long, I would stick to the list, for clarity. If you show what you actually want to do, there is probably an answer that works just as well for lists. (Since it's unclear how you want to do the boolean selection precisely.) Commented Oct 2, 2021 at 7:59
  • I want to do something as i have a boolean array, name it selection, of the same len as the list of astropy Table. I want to do my_select_list = mylist[selection]. Commented Oct 2, 2021 at 8:11

1 Answer 1

1

This looks to be an Astropy Table limitation, which I would consider a bug: Astropy's Table will prevent coercion to a NumPy array, since that doesn't always work: there is a specific check in the code that will raise a ValueError if there is a dtype specified when attempting to convert a table to a NumPy array.

Of course, here you are dealing with a list. But now you run into two issues: NumPy will attempt to convert the list to an array, and apply transformation of each individual element. You either get a 2D array with no dtype specified, or again, the ValueError with dtype specified:

ValueError: Datatype coercion is not allowed

The bug (as I consider it) is that Astropy checks for a dtype anything other than None. So even object as a dtype will raise this error, which I'm not sure it should.

Your work-around is therefore, in my opinion, fine. Not ideal, but it does the job, and it's basically just 2-3 lines of code.


Since, however, you mention boolean indexing, consider the following, while keeping everything in a list (which I think here is the better option: NumPy arrays are really meant for numbers, not so much objects):

indices = [True, False, True, False]
my_list = [....]  # list of tables
selection = [item for item, index in zip(my_list, indices) if index]  # filter all True values

or for numbered indices:

indices = [1, 3, 5, 6]
my_list = [....] # list of tables
selection = [my_list[i] for i in indices]

Same amount of lines as with NumPy indexing, and unless your list grows to thousands (millions) of elements, you wouldn't notice a performance difference. (If it does grow to millions of elements, you may need to reconsider your data structures anyway, which requires more rewriting elsewhere in your code.)

Sign up to request clarification or add additional context in comments.

2 Comments

Interesting and thanks for the insight. It's worth noting that pandas DataFrame also does not easily support making a numpy object array of dataframes. It is surprising to me that numpy is calling Table.__array__ on each of the two list elements in np.array([t1, t2]). Anyway, I'll see if there is some simple fix, but agreed with your suggestions on using plain Python lists.
@TomAldcroft Thanks for the response Tom. I agree it may be an issue with NumPy (as well), but I think that choice was made for ease of converting or combining arrays. You may already have seen the issue filed at github.com/astropy/astropy/issues/12229 .

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.