1

I submitted a load job to Google BigQuery which loads 12 compressed (gzip) tabular files from google cloud storage. Each file is about 2 gigs compressed. The command I ran was similar to:

bq load --nosync --skip_leading_rows=1 --source_format=CSV
--max_bad_records=14000 -F "\t" warehouse:some_dataset.2014_lines
gs://bucket/file1.gz,gs://bucket/file2.gz,gs://bucket/file12.gz 
schema.txt

I'm receiving the following error from my BigQuery load job with no explanation of why:

Error Reason:internalError. Get more information about this error at Troubleshooting Errors: internalError.

Errors: Unexpected. Please try again.

I'm certain that the schema file is correctly formatted as I've successfully loaded files using the same schema but different set of files.

I'm wondering in what kinds of situation would an internal error like this occur and what are some ways I could go about debugging this issue?

My BQ job id: bqjob_r78ca777a8ad4bdd9_0000014e2dc86e0e_1

Thank you!

6
  • "Someone from support"? I think you're in the wrong place. Commented Jun 27, 2015 at 1:02
  • 1
    Stack overflow is a free community of professionals who exchange questions and answers. No one here is being compensated. We're not a support network in the traditional sense. Google does provide traditional support services if you are willing to pay for them. But hang tight and someone here may have an answer for you. =D Commented Jun 27, 2015 at 1:05
  • So the way to get answers here, from fellow users such as yourself, is to write a minimal, complete, verifiable example of the problem so that other people can reproduce it and try to debug it. However, it isn't obvious that this is even a programming problem. Commented Jun 27, 2015 at 1:27
  • Hmm yes. I thought Google was using this thread to support paying customers. I guess I was mistaken. No this isn't a programming problem, it's more of I would like more output from Google to understand why. Commented Jun 27, 2015 at 1:43
  • Edited the question a bit. Hope this might me gain some insight as to why and when an unknown internal error happens. Commented Jun 27, 2015 at 2:11

1 Answer 1

1

There are some cases you can get into with large .gz input files that are not always reported with a clear cause. This can happen especially (but not exclusively) with highly compressible text, so that 1 GB of compressed data represents an unusually large amount of text.

The documented limit on this page for compressed CSV/JSON is 1 GB. If that is current, I would actually expect an error on your 2 GB input. Let me check that.

Are you able to split these files into smaller pieces and try again?

(Meta: Grace, you are correct that Google says that "Google engineers monitor and answer questions with the tag google-bigquery" on StackOverflow. I am a Google engineer, but there are also many knowledgeable people here who are not. Google's docs could perhaps give more explicit guidance: the questions that are most valuable to the StackOverflow community are ones that a future person can identify they're seeing this same problem, and preferably that a non-Googler can answer it from public information. It's tough in your case because the error is broad and the cause is unclear. But if you're able to reproduce the problem using an input file that you can make public, more people here will be able to take a crack at the problem. You can also file an issue for questions that really no one outside Google can do much with.)

Sign up to request clarification or add additional context in comments.

1 Comment

@eubrant Thanks for your answer, including the suggestions in the meta. That really helps.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.