From the course: Introduction to SQL Using Google BigQuery
Introduction to Google BigQuery - SQL Tutorial
From the course: Introduction to SQL Using Google BigQuery
Introduction to Google BigQuery
- [Instructor] Before we dive deeper, maybe we should take a step back and learn more about the BigQuery service. What is it exactly? How did it come into existence, and why is it such a useful tool in the modern data landscape? This is what the official BigQuery landing page says. "BigQuery is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data." This sounds amazing and all, but what does it all mean? Unlike most traditional SQL databases that run on a single computer or on a large cluster of computers in a physical data center, BigQuery is a cloud native data warehouse that can be accessed directly as a simple website over the internet. This means that there is no need to set up and maintain complex systems, including physical servers, networking and security amongst many other things. The term serverless describes this exact behavior where all of the setup, configuration, and maintenance is all inclusive as part of the BigQuery service. Serverless also extends to the ability of the system to instantaneously scale up or scale down the amount of compute resources required on a query by query basis. With multiple data centers connected over the Google network all over the world and access to hundreds of thousands if not millions of virtual computers, BigQuery not only stores huge amounts of data up to the petabyte scale, but it can also provide the ability to run complex analytical queries on the same massive data sets. BigQuery actually started as a science experiment in 2010 where a team of engineers wanted to help regular companies run web scale data analytics projects using in-house technology, which was originally purpose built for internal Google use. The modern version of BigQuery is powered by four main components, Dremel, Colossus, Borg, and Jupiter. Dremel is a Google developed parallel query engine built to run complex data analysis jobs on huge amounts of data. Colossus is the distributed storage system where the vast amount of BigQuery data is stored. Borg is used to power the serverless behavior of the BigQuery service, solving all the infrastructure management, set up, and scaling tasks. And finally, the global Jupiter network connects all these underlying systems, helping them talk to each other at lightning fast speeds. BigQuery is more than just a cloud native SQL engine. It can also connect to dozens of different database systems and alternative cloud providers making ingestion of new data sources quick and easy. BigQuery can also be used with various downstream data visualization and business intelligence applications, and you can even build advanced machine learning and AI models using SQL queries or customized routines using GCP's Vertex AI service, which, again, relies heavily on the data processing speed and power of BigQuery. These scalable and somewhat future-proof features of BigQuery has led to an explosive take up of BigQuery and Google Cloud platform over recent years, with BigQuery now in the running to be the choice enterprise data warehouse in the modern cloud computing era. I personally think BigQuery is here to stay, and I'm very excited to help you gain functional working knowledge of this amazing tool so you can see just how useful it is for yourself.