Weird behavior of numpy array type setting

Question

The code

np.array([100,200,300],dtype=str)

returns:

array(['1', '2', '3'], 
      dtype='|S1')

The documentation says:

dtype : data-type, optional

The desired data-type for the array. If not given, then the type will be determined as the minimum type required to hold the objects in the sequence.

Is this a bug?

Can you try using dtype='|S3' and see if that gives what you expect? — SethMMorton
– SethMMorton, Commented Aug 29, 2013 at 18:39
This question has been asked fairly recently, although I cannot find it right off. A detailed search of the numpy tag should lead you to it. — Daniel
– Daniel, Commented Aug 29, 2013 at 18:39
What happens if you have np.array([101,201,301],dtype=str) instead? — SethMMorton
– SethMMorton, Commented Aug 29, 2013 at 18:42

Daniel · Accepted Answer · 2013-08-29 19:37:39Z

2

I still cannot find the question, but to get around it:

>>> a=[100,200,300]

>>> np.char.mod('%d', a)
array(['100', '200', '300'],
      dtype='|S3')

This circumvents your problem:

>>> a=[100,200,3005]
>>> np.char.mod('%d', a)
array(['100', '200', '3005'],
      dtype='|S4')

The obscure documentation, it should be noted that this is roughly 4 times slower then choosing dtype="S..", but non-linearly faster then using np.array(map(str,a)) methods.

You can also do some neat things:

>>> a
[1234.5, 123.4, 12345]

>>> np.char.mod('%s',a)
array(['1234.5', '123.4', '12345.0'],
      dtype='|S7')

>>> np.char.mod('%f',a)
array(['1234.500000', '123.400000', '12345.000000'],
      dtype='|S12')

>>> np.char.mod('%d',a) #Note the truncation of decimals here.
array(['1234', '123', '12345'],
      dtype='|S5')

>>> np.char.mod('%s.stuff',a)
array(['1234.5.stuff', '123.4.stuff', '12345.0.stuff'],
      dtype='|S13')

Additional information can be found here.

edited Aug 29, 2013 at 19:37

answered Aug 29, 2013 at 18:46

Daniel

19.6k7 gold badges64 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Bitwise Over a year ago

Thanks! Can you add a link to the documentation for this function?

Daniel Over a year ago

I also updated this with a few extra examples, depending on what you are doing %d might not be optimal for you.

Community · Accepted Answer · 2017-05-23 11:49:51Z

1

The reason you see this behavior is that you have to specify the size of each string element e.g. using:

>>> np.array([100,200,300],dtype='S3')
      array(['100', '200', '300'], 
             dtype='|S3')

Otherwise the size of each element string will default to 1.

More info here: Numpy converting array from float to strings

edited May 23, 2017 at 11:49

CommunityBot

11 silver badge

answered Aug 29, 2013 at 18:46

crs17

5512 silver badges6 bronze badges

1 Comment

Bitwise Over a year ago

The problem is that to do this for an arbitrary list of numbers I need to check the lengths of all the numbers first.

Collectives™ on Stack Overflow

Weird behavior of numpy array type setting

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related