0

I have a dataset which I am querying with SQL. My query returns a long string which simply contains the column names and then the data, with rows separated by newline characters. I then use numpy.genfromtxt to turn this long string into a numpy array.

However, there are a few columns that should be read as strings. So, I am explicitly passing a dtype array to genfromtxt so that it saves the column values correctly. However, when I inspect the output, all column entries that should be a string simply appear as '', an empty string.

I am declaring the data type of these columns as str. As an example, one such entry that is turning into an empty string is, in the original dataset, the word GALAXY. However, on the official docs for the dataset, it is listed that the data type of this column is varchar. I assumed str would be the correct type for this, but I guess not.


Edit: Ignore that this has anything to with SQL. Basically, I have a string that is the result of a query, and I need to pack it into a numpy array using np.genfromtxt. I avoided posting the explicit strings because they are brutal to look at, but here is one:

b'bestObjID,ra,dec,z,zErr,zWarning,class,subClass,rChi2,DOF,rChi2Diff,z_noqso,zErr_noqso,zWarning_noqso,class_noqso,subClass_noqso,rChi2Diff_noqso,velDisp,velDispErr,velDispZ,velDispZErr,velDispChi2\n1237662340012638224,239.58334,27.233419,0.09080672,2.924875E-05,0,GALAXY,,1.104714,3735,1.411605,0,0,0,,,0,272.6187,13.61222,0,0,1815.653\n'

As you can see, it is a bytes object with rows separated by \n and the first row being the column labels.

The result of passing this to np.genfromtxt is

array((1237662340012638224, 239.58334, 27.233419, 0.09080672264099121, 2.9248749342514202e-05, 0, '', '', 1.104714035987854, 3735.0, 1.4116050004959106, 0.0, 0.0, 0, '', '', 0.0, 272.61871337890625, 13.61221981048584, 0.0, 0.0, 1815.6529541015625), dtype=[('bestObjID', '<i8'), ('ra', '<f8'), ('dec', '<f8'), ('z', '<f4'), ('zErr', '<f4'), ('zWarning', '<i8'), ('class', '<c16'), ('subClass', '<c16'), ('rChi2', '<f4'), ('DOF', '<f4'), ('rChi2Diff', '<f4'), ('z_noqso', '<f4'), ('zErr_noqso', '<f4'), ('zWarning_noqso', '<i8'), ('class_noqso', '<c16'), ('subClass_noqso', '<c16'), ('rChi2Diff_noqso', '<f4'), ('velDisp', '<f4'), ('velDispErr', '<f4'), ('velDispZ', '<f4'), ('velDispZErr', '<f4'), ('velDispChi2', '<f4')])

You can see how what should say 'GALAXY' turns into '' when I specify that the data type of this entry is str. If I instead use the c dataype, I can recover the G of GALAXY, but nothing more. If I try to use c8 or c16, I get (nan+0j)

6
  • Your question sounds very mislead -- the numpy library is not meant to be used as a DBAPI. If you're manipulating/reading data from a normal SQL db, can you also clarify how you ended up with trying to parse the results with numpy? It sounds like that's probably where your real problem is. Also you may want to read up on how to write an issue with a Minimal, Complete, and Verifiable example Commented Aug 8, 2016 at 20:22
  • @ThomasTu Well this is very specific to what I'm working on. I am querying data from the SDSS (Sloan Digital Sky Survey), using a python script provided on the SDSS website. this script does they querying, and returns a string, as I described in my first paragraph. Perhaps I should not have even mentioned SQL, my issue is really just turning a string of entries into a numpy array via genfromtxt. I'll update the post with some more info Commented Aug 8, 2016 at 20:25
  • Does astropy/astroquery accomplish what you're trying to do for you? If not, try to post an mcve and that will probably help generate a more useful answer. Commented Aug 8, 2016 at 20:30
  • @ThomasTu No, I have a custom script supplied by SDSS to do exactly what I need to do. Everything is fine, except for getting this string to save in the numpy array as a string. That's all. I'm doing the querying correctly otherwise. See my edit please Also, what's mcve? Commented Aug 8, 2016 at 20:31
  • Are you specifying a custom list of dtypes for each column? How close does passing dtype=None get you to what you want? and a mcve is a minimal complete verifiable example, which this is not, hence the downvote someone probably gave you. I voted it up to get it to 0 because this is a legitimate question that just needs some massaging, but you should probably still post your exact np.genfromtxt call. Commented Aug 8, 2016 at 20:53

1 Answer 1

1

I'm guessing at how you're using genfromtxt, but this seems to work?

import numpy as np
from StringIO import StringIO

s = b'bestObjID,ra,dec,z,zErr,zWarning,class,subClass,rChi2,DOF,rChi2Diff,z_noqso,zErr_noqso,zWarning_noqso,class_noqso,subClass_noqso,rChi2Diff_noqso,velDisp,velDispErr,velDispZ,velDispZErr,velDispChi2\n1237662340012638224,239.58334,27.233419,0.09080672,2.924875E-05,0,GALAXY,,1.104714,3735,1.411605,0,0,0,,,0,272.6187,13.61222,0,0,1815.653\n'

S = lambda : StringIO(s)

np.genfromtxt(S(), dtype = None, names=True, delimiter=',')

outputs

array((1237662340012638224, 239.58334, 27.233419, 0.09080672, 2.924875e-05, 0, 'GALAXY', False, 1.104714, 3735, 1.411605, 0, 0, 0, False, False, 0, 272.6187, 13.61222, 0, 0, 1815.653), 
  dtype=[('bestObjID', '<i8'), ('ra', '<f8'), ('dec', '<f8'), ('z', '<f8'), ('zErr', '<f8'), ('zWarning', '<i8'), ('class', 'S6'), ('subClass', '?'), ('rChi2', '<f8'), ('DOF', '<i8'), ('rChi2Diff', '<f8'), ('z_noqso', '<i8'), ('zErr_noqso', '<i8'), ('zWarning_noqso', '<i8'), ('class_noqso', '?'), ('subClass_noqso', '?'), ('rChi2Diff_noqso', '<i8'), ('velDisp', '<f8'), ('velDispErr', '<f8'), ('velDispZ', '<i8'), ('velDispZErr', '<i8'), ('velDispChi2', '<f8')])
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, this is a legitimate solution, though I ended up still explicitly declaring the data types and using a8 for the strings.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.