In the BigQuery Client Libraries it is documented how to set up the authentication both from the GCP console and the Command Line.
To employ the BigQuery API library you need to authenticate your service account. The gcloud command gcloud iam service-accounts keys create [FILE_NAME].json --iam-account [NAME]@[PROJECT_ID].iam.gserviceaccount.com generates a JSON key file with the necessary private information (like your project_id, private key, etc) to do so.
When making BigQuery API calls, you need to provide such credentials to your application code. It can be done by setting the environment variable GOOGLE_APPLICATION_CREDENTIALS pointing to the path of the service account JSON file
export GOOGLE_APPLICATION_CREDENTIALS="PATH/TO/SERVICE_ACCOUNT.json"
However, this will work only during your current shell session, so if this one expires or you open a new one you will need to set this variable again. Another way to authenticate the credentials is to employ the method
google.oauth2.Credentials.from_service_account_file inside of your Python script.
In the following Python code the service account is authenticated with the method google.oauth2.Credentials.from_service_account_file, a new BigQuery table is generated from a CSV file located in Google Cloud Storage and new data is inserted into such table.
from google.cloud import bigquery
from google.oauth2 import service_account
# Path to the service account credentials
key_path = "/PATH/TO/SERVICE-ACCOUNT.json"
credentials = service_account.Credentials.from_service_account_file(
key_path,
scopes=["https://www.googleapis.com/auth/cloud-platform"],
)
# Instantiation of the BigQuery client
bigquery_client = bigquery.Client()
GCS_URI = "gs://MY_BUCKET/MY_CSV_FILE"
DATASET_ID = "MY_DATASET"
TABLE_ID = "MY_TABLE"
def bq_insert_from_gcs(target_uri = GCS_URI, dataset_id = DATASET_ID, table_id = TABLE_ID):
"""This method inserts a CSV file stored in GCS into a BigQuery Table."""
dataset_ref = bigquery_client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
# Schema autodetection enabled
job_config.autodetect = True
# Skipping first row which correspnds to the field names
job_config.skip_leading_rows = 1
# Format of the data in GCS
job_config.source_format = bigquery.SourceFormat.CSV
load_job = bigquery_client.load_table_from_uri(target_uri,\
dataset_ref.table(table_id),\
job_config=job_config)\
print('Starting job {}'.format(load_job.job_id))
print('Loading file {} into the Bigquery table {}'.format(target_uri, table_id))
load_job.result()
return 'Job finished.\n'
def bq_insert_to_table(rows_to_insert, dataset_id = DATASET_ID, table_id= TABLE_ID):
"""This method inserts rows into a BigQuery table"""
# Prepares a reference to the dataset and table
dataset_ref = bigquery_client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
# API request to get table call
table = bigquery_client.get_table(table_ref)
# API request to insert the rows_to_insert
print("Inserting rows into BigQuery table {}".format(table_id))
errors = bigquery_client.insert_rows(table, rows_to_insert)
assert errors == []
bq_insert_from_gcs()
rows_to_insert = [( u'Alice', u'cat'),\
(u'John', u'dog')]
bq_insert_to_table(rows_to_insert)
Also, I would strongly recommend to implement your script using Python 3, since Python 2 will no longer be supported by the google-cloud-bigquery from 01/01/2020.