I recently started using Google's BigQuery service, and their Python API, to query some large databases. I'm new to SQL, and the BigQuery documentation isn't incredibly helpful for what I'm doing.
Currently I'm looking through the reddit_comments database, and there's 'created_utc' tag that I'm trying to filter by. This created_utc field is in terms of Unix timestamps (i.e. November 1st, 12:00 AM is 1541030400)
I'd like to grab comments day by day (or between two Unix timestamps) but in a way that I'm iterating over each day. Something like:
from datetime import datetime, timedelta
start = datetime.fromtimestamp(1538352000)
end = datetime.fromtimestamp(1541030400)
time = start
while time < end:
print(time)
time = time + timedelta(days = 1)
Printing times here yield one like: 2018-09-30 20:00:00
However in order to query, I have to convert back to the Unix timestamp by invoking datetime's timestamp() function like time.timestamp()
The problem is, I'm trying to use the timestamp() function inside the query like so:
SELECT *
FROM 'fh-bigquery.reddit_comments.2018_10'
...
AND (created_utc >= curr_day.timestamp() AND created_utc <= next_day.timestamp())
however, it's throwing a BadRequest: 400 Function not found. Is there a way to use built-in Python functions in the way that I've described above? Or does there need to be some alternative?
Everything so far seems pretty intuitive, but it's weird that I can't find much helpful information on this specifically.