1

I need to convert a Google Cloud Datastore query result to a dataframe, to create a chart from the retrieved data. The query:

def fetch_times(limit):
    start_date = '2019-10-08'
    end_date = '2019-10-19'
    query = datastore_client.query(kind='ParticleEvent')
    query.add_filter(
        'published_at', '>', start_date)
    query.add_filter(
        'published_at', '<', end_date)
    query.order = ['-published_at']
    times = query.fetch(limit=limit)
    return times

creates a json like string of the results for each entity returned by the query:

  • Entity('ParticleEvent', 5942717456580608) {'gc_pub_sub_id': '438169950283983', 'data': '605', 'event': 'light intensity', 'published_at': '2019-10-11T14:37:45.407Z', 'device_id': 'e00fce6847be7713698287a1'}>

Thought I found something that would translate to json which I could convert to dataframe, but get an error that the properties attribute does not exist:

def to_json(gql_object):
    result = []
    for item in gql_object:
        result.append(dict([(p, getattr(item, p)) for p in item.properties()]))
    return json.dumps(result, cls=JSONEncoder)

Is there a way to iterate through the query results to get them into a dataframe either directly to a dataframe or by converting to json then to dataframe?

2
  • Here is a similar post which will help you. This one is to fetch JSON from Datastore Commented Jan 6, 2020 at 12:05
  • Can you print in the console what's item and the whole gql_object, in order to provide a solution? Commented Jan 6, 2020 at 12:12

4 Answers 4

4

Datastore entities can be treated as Python base dictionaries! So you should be able to do something as simple as...

df = pd.DataFrame(datastore_entities)

...and pandas will do all the rest.

If you needed to convert the entity key, or any of its attributes to a column as well, you can pack them into the dictionary separately:

for e in entities:
    e['entity_key'] = e.key
    e['entity_key_name'] = e.key.name  # for example

df = pd.DataFrame(entities)
Sign up to request clarification or add additional context in comments.

Comments

2

You can use pd.read_json to read your json query output into a dataframe.

Assuming the output is the string that you have shared above, then the following approach can work.

#Extracting the beginning of the dictionary
startPos = line.find("{")

df = pd.DataFrame([eval(line[startPos:-1])])

Output looks like :

     gc_pub_sub_id data            event              published_at  \
0  438169950283983  605  light intensity  2019-10-11T14:37:45.407Z   

                  device_id  
0  e00fce6847be7713698287a1 

Here, line[startPos:-1] is essentially the entire dictionary in that sthe string input. Using eval, we can convert it into an actual dictionary. Once we have that, it can be easily converted into a dataframe object

5 Comments

Thanks for the input. The query output is not true json, but similar in that it is {keyword: value}.
Could you provide some sample of the data then? Without the proper format of how the data looks like, it’s difficult to give an exact solution
i included an example query output above Entity('ParticleEvent', 5942717456580608) {'gc_pub_sub_id': '438169950283983', 'data': '605', 'event': 'light intensity', 'published_at': '2019-10-11T14:37:45.407Z', 'device_id': 'e00fce6847be7713698287a1'}
I have found a (rather dirty) workaround. I convert each item in the query result object to string, and then manually parse the string to extract the data I need into a list. Now looking again at the output, i may have been able to clip off the header portion of the item : Entity('ParticleEvent', 5942717456580608) which would leave just the (key: value). Then it may be recognized as json to convert more easily.
Could you please post your workaround solution as an answer for greater visibility to other community users? Thank you.
1

Original poster found a workaround, which is to convert each item in the query result object to string, and then manually parse the string to extract the needed data into a list.

Comments

1

The return value of the fetch function is google.cloud.datastore.query.Iterator which behaves like a List[dict] so the output of fetch can be passed directly into pd.DataFrame.

import pandas as pd

df = pd.DataFrame(fetch_times(10))

This is similar to @bkitej, but I added the use of the original poster's function.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.