How To Pin public datasets to a Project In Google BigQuery

How To Pin public datasets to a Project In Google BigQuery

I recently started spending more time in GCP, so this is a short and sweet post on how to pin the bigquery-public-data dataset in Google BigQuery in three easy steps. I clicked around for ages trying to figure this out, but thanks to a little bit of googling and dumb luck I figured it out. I’m typing it up here for me and for you!

Step 1

First things first open up BigQuery and hit “ADD DATA” button.

Step 2

In the Add data pop up window select the option “Star a project by name

Step 3

Enter the project name: bigquery-public-data. In the Star a project dialog box.

And voila! The bigquery-public-data dataset will be pinned in the explorer side window. If you expand the dataset you will have a wide range of datasets to play with.

Below is a basic query using the dataset it’s as simple as that! It should be noted you can also add 3rd party datasets this way too example: dbt-tutorial after figuring all this out, I found this YouTube video detailing how to add 3rd party data sets simpler to the above steps.

Warning

Warning when playing with these public data sets some of them are big! – Be mindful of the cost of querying these datasets and the table size. If you want to explore data in public data, always check the table size first before querying it. As a rule of thumb, as long as the table is under 1 GB in size, in general, you are safe. Remember that the free tier is limited to 1 TB query size per month.

Well that’s it as I said short and sweet. I must admit much prefer BigQuery to Athena, but I will save that for another post.