1

I want to make some very easy tasks on BigQuery via a python script. I found this package which does not work well. Indeed, when I try this code:

from bigquery import get_client


project_id = 'txxxxxxxxxxxxxxxxxx9'
# Service account email address as listed in the Google Developers Console.
service_account = '7xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com'
# PKCS12 or PEM key provided by Google.
key = '/home/fxxxxxxxxxxxx/Dropbox/access_keys/google_storage/xxxxxxxxxxxxxxxxxxxxx.pem'
client = get_client(project_id, service_account=service_account, private_key_file=key, readonly=True)
# Submit an async query.
results = client.get_table_schema('newdataset', 'newtable2')

print('results')

I get this error:

/home/xxxxxx/anaconda3/envs/snakes/bin/python2.7 /home/xxxxxx/Dropbox/Prog/bigQuery_daily_import/src/main.py
Traceback (most recent call last):
  File "/home/xxxxxx/Dropbox/Prog/bigQuery_daily_import/src/main.py", line 9, in <module>
    client = get_client(project_id, service_account=service_account, private_key_file=key, readonly=True)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/bigquery/client.py", line 83, in get_client
    readonly=readonly)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/bigquery/client.py", line 101, in _get_bq_service
    service = build('bigquery', 'v2', http=http)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/util.py", line 142, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/googleapiclient/discovery.py", line 196, in build
    cache)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/googleapiclient/discovery.py", line 242, in _retrieve_discovery_doc
    resp, content = http.request(actual_url)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 565, in new_request
    self._refresh(request_orig)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 835, in _refresh
    self._do_refresh_request(http_request)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 862, in _do_refresh_request
    body = self._generate_refresh_request_body()
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 1541, in _generate_refresh_request_body
    assertion = self._generate_assertion()
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/client.py", line 1670, in _generate_assertion
    private_key, self.private_key_password), payload)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/oauth2client/_pycrypto_crypt.py", line 121, in from_string
    pkey = RSA.importKey(parsed_pem_key)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/Crypto/PublicKey/RSA.py", line 665, in importKey
    return self._importKeyDER(der)
  File "/home/xxxxxx/anaconda3/envs/snakes/lib/python2.7/site-packages/Crypto/PublicKey/RSA.py", line 588, in _importKeyDER
    raise ValueError("RSA key format is not supported")
ValueError: RSA key format is not supported

Process finished with exit code 1

My question: is there a tutorial in python which shows how to communicate easily with BigQuery: importing a dataset from google storage or S3, querying something, exporting the result to google storage.

1 Answer 1

3

A lot depends on your environment, and once you've figure that out everything should be super simple. I see the only problem on the error log you pasted is figuring out authentication.

Python pandas has had support for BigQuery for a while:

And I did a video with the creators of the module:

Now, the simplest and fastest way these days to launch an Jupyter notebook with all of the Google Cloud goodies you mention is our new Google Datalab project:

The only Datalab caveat is that it works on cloud servers, but if you want a fully managed Jupyter/IPython environment, totally secure, persistent, and ready to handle BigQuery, storage, etc... try it out.


Meanwhile, if you are writing a web application look at how other web applications solve this task.

For example, re:dash code to connect to BigQuery:

https://github.com/EverythingMe/redash/blob/master/redash/query_runner/big_query.py

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for your answer. The problem is that I want to build a web application, not explore some data with a jupyter like notebook. I am only looking for an equivalent of the packages boto or psycopg2 or sqlalchemy-redshift which works very well for redshift.
context helps a lot! check how re:dash does it github.com/EverythingMe/redash/blob/master/redash/query_runner/…
meanwhile I haven't tried sqlalchemy-bigquery, but since you mention using sqlalchemy: pypi.python.org/pypi/sqlalchemy_bigquery

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.