3

Need help with the json object load into bigquery, I'm able to establish the connection but now able to load as it keep giving below error, any suggestion? tried both json as a string and object. both giving error

JSON object
d = {}
d['date'] = date_time()
d['status' ] = status
#a = json.dumps(d, indent = 2) # as a json string

qc=bigquery.Client(project=project_name)
dataset = qc.dataset(dataset)
table = dataset.table(table)
table_nm = qc.get_table(table)

qc.insert_rows_json(table_nm, d)

input dict : {"date": "2021-02-01-11.19.55", "status": "Pass "}

error: raise TypeError("json_rows argument should be a sequence of dicts") TypeError: json_rows argument should be a sequence of dicts

1 Answer 1

4

The insert_rows_json method expects multiple rows to be written at a time. Specify your structure as a list of JSON objects rather than a single JSON object.

d = {}
d['date'] = date_time()
d['status' ] = status
#a = json.dumps(d, indent = 2) # as a json string

qc=bigquery.Client(project=project_name)
dataset = qc.dataset(dataset)
table = dataset.table(table)
table_nm = qc.get_table(table)

errors = qc.insert_rows_json(
    table_nm,
    [d],  # Must be a list of objects, even if only 1 row.
)
for error in errors:
    print(f"encountered error: {error}")

Note: in the case of errors / retries, according to the BigQuery documentation on the streaming API "De-duplication offered by BigQuery is best effort, and it should not be relied upon as a mechanism to guarantee the absence of duplicates in your data" Therefore, the BigQuery documentation recommends periodically removing duplicates from the destination table when using the streaming API possibly by a scheduled query.

#standardSQL
SELECT
  * EXCEPT(row_number)
FROM (
  SELECT
    *,
    ROW_NUMBER()
          OVER (PARTITION BY ID_COLUMN) row_number
  FROM
    `TABLE_NAME`)
WHERE
  row_number = 1

For details on de-duplication, see the BigQuery streaming API guide

Sign up to request clarification or add additional context in comments.

4 Comments

any advise to convert/create a list object d= [] d=[date_time(), staus]
If date_time() is a datetime.datetime object, you can use the same example, but use qc.insert_rows(...) instead of qc.insert_rows_json(...). The only difference between the two methods is that insert_rows(...) uses the BigQuery schema to determine how to convert objects to something that can be serialized with JSON.
@TimSwast, any advice on best practices for error handling? Does it have inbuilt rollback on error? Let's say I have a json file containing 10 records and there are some errors loading that into BQ after say 5th record. In this case, does the BQ table have 5 records?
According to cloud.google.com/bigquery/docs/streaming-data-into-bigquery "De-duplication offered by BigQuery is best effort, and it should not be relied upon as a mechanism to guarantee the absence of duplicates in your data" Therefore, it recommends manually/periodically removing duplicates from the destination table when using the streaming API possibly by a scheduled query. cloud.google.com/bigquery/docs/… I'll update the answer with this information.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.