1

Let me start off by saying that I'm fairly new to numpy and pandas. I'm trying to construct a pandas dataframe but I'm not sure that I'm doing things in an appropriate way.

My setting is that I have a large list of .Net objects (that I have very little control over) and I want to build a time series from this using pandas dataframe. I have an example where I have replaced the .Net class with a simplified placeholder class just for demonstration. The listOfthings in the code is basically what I get from .Net and I want to convert that into a pandas dataframe.

My questions are:

  1. I construct the dataframe by first constructing a numpy array. Is this necessary? Also, this array doesn't have the size 1000x2 as I expect. Is there a better way to use numpy here?
  2. This code doesn't work because I doesn't seem to be able to cast the string to a datetime64. This confuses me since the string is in ISO format and it works when I try to parse it like this: np.datetime64(str(np.datetime64('now','us'))).

Code sample:

import numpy as np
import pandas as pd

class PlaceholderClass:
    def time(self):
        return str(np.datetime64('now', 'us'))
    def value(self):
        return 100*np.random.random_sample()


listOfThings = [PlaceholderClass() for i in range(1000)]

arr = np.array([(x.time(), x.value()) for x in listOfThings], dtype=[('time', np.datetime64), ('value', np.float)])

dataframe = pd.DataFrame(data=arr['value'], index=arr['time'])

Thanks in advance

1 Answer 1

2

Q1:

I think it is not necessary to first make an np.array and then create the dataframe. This works perfectly fine, for example:

rd = lambda: datetime.date(randint(2005,2025), randint(1,12),randint(1,28))

df = pd.DataFrame([(rd(), rd()) for x in range(100)])

Added later:

df = pd.DataFrame((x.value() for x in listOfThings), index=(pd.to_datetime(x.time()) for x in listOfThings))

Q2:

I noticed that pd.to_datetime('some date') almost always gets it right. Even without specifying the format. Perhaps this helps.

In [115]: pd.to_datetime('2008-09-22T13:57:31.2311892-04:00')
Out[115]: Timestamp('2008-09-22 17:57:31.231189200')
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your reply @PandasRocks. But I don't get how I can apply your answer in my case. Given ListOfThings in from my example, can you demostrate how to construct the dataframe with the datetimes as the Index?
@DoubleTrouble - I have added another line. I think you can use the generator syntax, so Pandas will create the index and the values on the fly. If I am not mistaken, Pandas series are distinct from the numpy arrays.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.