using Google BigQuery through Python script

Question

I want to make some very easy tasks on BigQuery via a python script. I found this package which does not work well. Indeed, when I try this code:

from bigquery import get_client


project_id = 'txxxxxxxxxxxxxxxxxx9'
# Service account email address as listed in the Google Developers Console.
service_account = '7xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com'
# PKCS12 or PEM key provided by Google.
key = '/home/fxxxxxxxxxxxx/Dropbox/access_keys/google_storage/xxxxxxxxxxxxxxxxxxxxx.pem'
client = get_client(project_id, service_account=service_account, private_key_file=key, readonly=True)
# Submit an async query.
results = client.get_table_schema('newdataset', 'newtable2')

print('results')

I get this error:

/home/xxxxxx/anaconda3/envs/snakes/bin/python2.7 /home/xxxxxx/Dropbox/Prog/bigQuery_daily_import/src/main.py
Traceback (most recent call last):
  File "/home/xxxxxx/Dropbox/Prog/bigQuery_daily_import/src/main.py", line 9, in <module>
    client = get_client(project_id, service_account=service_account, private_key_file=key, readonly=True)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/bigquery/client.py", line 83, in get_client
    readonly=readonly)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/bigquery/client.py", line 101, in _get_bq_service
    service = build('bigquery', 'v2', http=http)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/util.py", line 142, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/googleapiclient/discovery.py", line 196, in build
    cache)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/googleapiclient/discovery.py", line 242, in _retrieve_discovery_doc
    resp, content = http.request(actual_url)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 565, in new_request
    self._refresh(request_orig)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 835, in _refresh
    self._do_refresh_request(http_request)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 862, in _do_refresh_request
    body = self._generate_refresh_request_body()
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 1541, in _generate_refresh_request_body
    assertion = self._generate_assertion()
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 1670, in _generate_assertion
    private_key, self.private_key_password), payload)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/_pycrypto_crypt.py", line 121, in from_string
    pkey = RSA.importKey(parsed_pem_key)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/Crypto/PublicKey/RSA.py", line 665, in importKey
    return self._importKeyDER(der)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/Crypto/PublicKey/RSA.py", line 588, in _importKeyDER
    raise ValueError("RSA key format is not supported")
ValueError: RSA key format is not supported

Process finished with exit code 1

My question: is there a tutorial in python which shows how to communicate easily with BigQuery: importing a dataset from google storage or S3, querying something, exporting the result to google storage.

Felipe Hoffa · Accepted Answer · 2015-11-04 05:27:08Z

3

A lot depends on your environment, and once you've figure that out everything should be super simple. I see the only problem on the error log you pasted is figuring out authentication.

Python pandas has had support for BigQuery for a while:

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.gbq.read_gbq.html

And I did a video with the creators of the module:

https://www.youtube.com/watch?v=gLeTDUMb7HY

Now, the simplest and fastest way these days to launch an Jupyter notebook with all of the Google Cloud goodies you mention is our new Google Datalab project:

https://cloud.google.com/datalab/

The only Datalab caveat is that it works on cloud servers, but if you want a fully managed Jupyter/IPython environment, totally secure, persistent, and ready to handle BigQuery, storage, etc... try it out.

Meanwhile, if you are writing a web application look at how other web applications solve this task.

For example, re:dash code to connect to BigQuery:

https://github.com/EverythingMe/redash/blob/master/redash/query_runner/big_query.py

edited Nov 4, 2015 at 5:27

answered Nov 3, 2015 at 22:44

Felipe Hoffa

59.8k23 gold badges185 silver badges363 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

sweeeeeet Over a year ago

Thank you for your answer. The problem is that I want to build a web application, not explore some data with a jupyter like notebook. I am only looking for an equivalent of the packages boto or psycopg2 or sqlalchemy-redshift which works very well for redshift.

Felipe Hoffa Over a year ago

context helps a lot! check how re:dash does it github.com/EverythingMe/redash/blob/master/redash/query_runner/…

Felipe Hoffa Over a year ago

meanwhile I haven't tried sqlalchemy-bigquery, but since you mention using sqlalchemy: pypi.python.org/pypi/sqlalchemy_bigquery

Collectives™ on Stack Overflow

using Google BigQuery through Python script

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related