1

I'm trying to load a JSON directly into a new Google BigQuery table, without converting it to a CSV.

The JSON I receive is in the following format:

{'columns': ['Month ', 'Product ', 'Customer', 'sales_qty', 'net_sales'],
 'data': [['201701', '001117', 'C00899', 72.0, 27400.0], ['201701', '001117', 'C00954', 72.0, 32640.0]]}

I need to be able to programmatically create the table as the schema might vary, so I'm using the autodetect option in JobConfig (works fine with CSVs)

I wrote the following snippet of code:

json_object = json.loads(my_json)
gbqclient = bigquery.Client.from_service_account_json(GBQ_JSON_KEY)
dataset_ref = gbqclient.dataset(GBQ_DATASET)
table_ref = dataset_ref.table(GBQ_TABLE)

# job config for layout JSON
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
#job_config.skip_leading_rows = 0
job_config.autodetect = True

# push data
job = gbqclient.load_table_from_json(json_object , table_ref, job_config = job_config)
pass

This connects properly to GBQ but the upload job then fails with an error saying:

Error while reading data, error message: Failed to parse JSON: No active field found.; ParsedString returned false

I understand the JSON format i'm using is not a NEWLINE_DELIMITED_JSON, I am looking for help as to how to convert the JSON in a format readable by the BigQuery APIs.

Thanks!

1
  • Please let me know if this works for you. Commented Dec 3, 2020 at 20:51

1 Answer 1

2

Your JSON object is not correct. You need to have an array of rows. The rows are key:value element.

Like this

data = [
  {
    'Month':'201701',
    'Product':'001117',
    'Customer':'C00899',
    'sales_qty':72.0, 
    'net_sales':27400.0
  },
  {
    'Month':'201701',
    'Product':'001117',
    'Customer':'C00954',
    'sales_qty':72.0, 
    'net_sales':32640.0
  },
]

Transform your current JSON to got one like this.


Here a sample code to translate your entry to the data expected

    entry = {'columns': ['Month ', 'Product ', 'Customer', 'sales_qty', 'net_sales'],
           'data': [['201701', '001117', 'C00899', 72.0, 27400.0], ['201701', '001117', 'C00954', 72.0, 32640.0]]}

    c = entry['columns']
    data=[]
    for d in entry['data']:
        row={}
        for i in range(len(c)):
            row[c[i]] = d[i]
        data.append(row)
    print(data)

Note: my python skill is low, but it works

Sign up to request clarification or add additional context in comments.

3 Comments

I know it's not correct - I don't know how to transform the one I have in the correct format - I cannot change the source JSON so I have to do the transformation in python
I added a sample code. What's the size of your files?
about 200.000 rows. Works like a charm, thanks a lot!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.