insert_rows_json error using python to load bigquery

Question

Need help with the json object load into bigquery, I'm able to establish the connection but now able to load as it keep giving below error, any suggestion? tried both json as a string and object. both giving error

JSON object
d = {}
d['date'] = date_time()
d['status' ] = status
#a = json.dumps(d, indent = 2) # as a json string

qc=bigquery.Client(project=project_name)
dataset = qc.dataset(dataset)
table = dataset.table(table)
table_nm = qc.get_table(table)

qc.insert_rows_json(table_nm, d)

input dict : {"date": "2021-02-01-11.19.55", "status": "Pass "}

error: raise TypeError("json_rows argument should be a sequence of dicts") TypeError: json_rows argument should be a sequence of dicts

Tim Swena · Accepted Answer · 2023-06-21 15:39:07Z

4

The insert_rows_json method expects multiple rows to be written at a time. Specify your structure as a list of JSON objects rather than a single JSON object.

d = {}
d['date'] = date_time()
d['status' ] = status
#a = json.dumps(d, indent = 2) # as a json string

qc=bigquery.Client(project=project_name)
dataset = qc.dataset(dataset)
table = dataset.table(table)
table_nm = qc.get_table(table)

errors = qc.insert_rows_json(
    table_nm,
    [d],  # Must be a list of objects, even if only 1 row.
)
for error in errors:
    print(f"encountered error: {error}")

Note: in the case of errors / retries, according to the BigQuery documentation on the streaming API "De-duplication offered by BigQuery is best effort, and it should not be relied upon as a mechanism to guarantee the absence of duplicates in your data" Therefore, the BigQuery documentation recommends periodically removing duplicates from the destination table when using the streaming API possibly by a scheduled query.

#standardSQL
SELECT
  * EXCEPT(row_number)
FROM (
  SELECT
    *,
    ROW_NUMBER()
          OVER (PARTITION BY ID_COLUMN) row_number
  FROM
    `TABLE_NAME`)
WHERE
  row_number = 1

For details on de-duplication, see the BigQuery streaming API guide

edited Jun 21, 2023 at 15:39

answered Feb 4, 2021 at 23:01

Tim Swena

14.9k4 gold badges43 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

E88 Over a year ago

any advise to convert/create a list object d= [] d=[date_time(), staus]

Tim Swena Over a year ago

If date_time() is a datetime.datetime object, you can use the same example, but use qc.insert_rows(...) instead of qc.insert_rows_json(...). The only difference between the two methods is that insert_rows(...) uses the BigQuery schema to determine how to convert objects to something that can be serialized with JSON.

user1401472 Over a year ago

@TimSwast, any advice on best practices for error handling? Does it have inbuilt rollback on error? Let's say I have a json file containing 10 records and there are some errors loading that into BQ after say 5th record. In this case, does the BQ table have 5 records?

Tim Swena Over a year ago

According to cloud.google.com/bigquery/docs/streaming-data-into-bigquery "De-duplication offered by BigQuery is best effort, and it should not be relied upon as a mechanism to guarantee the absence of duplicates in your data" Therefore, it recommends manually/periodically removing duplicates from the destination table when using the streaming API possibly by a scheduled query. cloud.google.com/bigquery/docs/… I'll update the answer with this information.

Collectives™ on Stack Overflow

insert_rows_json error using python to load bigquery

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related