Google BigQuery
The Google BigQuery integration shows how to set up a connection to Google BigQuery using Python. We demonstrate how to read data from a BigQuery table to a Pandas DataFrame and vice versa. Before your Python script can interact with Google BigQuery, we first need to get a Google service account key file from a service account that has access to Google BigQuery.
Get Google service account key file
To set up the Google BigQuery connection on a remote system like AskAnna, you must have a Google service account with permission to access the BigQuery dataset. You can create a new service account via this link if you don't have one. To authenticate, you need to have the associated private JSON key of the service account or create a new service account JSON key.
With the service account, you can get the JSON key via the following steps:
- Click the email address of the service account that you created
- Click the
KEYS
tab - Click the
ADD KEY
drop-down menu, then selectCreate new key
. - Select
JSON
as the Key type and click Create. - This creates and downloads a JSON file that you can use to set up the connection
For more information on service accounts, see the Getting started with authentication on Google Cloud Platform docs.
Python
In the example below, we show how to set up a connection to a Google BigQuery data source using Python. We demonstrate how to read data from a BigQuery table to a Pandas DataFrame and vice versa. To do this, we need Python 3.7 or newer and install the following packages:
We recommend using a virtual environment and adding the packages to a requirements.txt file. In this file, you can add the following:
pandas-gbq # tested with version 0.19.1
python-dotenv # tested with version 0.21.1
Info
For Python, there are other packages that you can use to set up a connection to Google BigQuery. The example we show here demonstrates a method you could use with Pandas and how you could apply it in AskAnna.
The AskAnna platform is flexible. For example, if you don't use Pandas, you could set up the connection directly using the google-cloud-bigquery library. With AskAnna you could use this package, or other packages, to set up a connection as well.
Read from BigQuery table
In the following example code, we set up a connection to Google BigQuery. With pandas-gbq, we read a table from BigQuery to a Pandas DataFrame.
import json
import os
import pandas_gbq
from google.oauth2 import service_account
# More about dotenv in the section `Configure dotenv`
from dotenv import find_dotenv, load_dotenv
load_dotenv(find_dotenv())
credentials = service_account.Credentials.from_service_account_info(
json.loads(os.getenv("GC_CREDENTIAL"))
)
pandas_gbq.context.credentials = credentials
pandas_gbq.context.project = "your-project-id"
sql = "SELECT * FROM dataset.table"
df = pandas_gbq.read_gbq(sql)
Write to BigQuery table
We reuse almost to full setup as with reading the data from Google BigQuery. Instead of reading data, we first create a Pandas DataFrame and then write the data to BigQuery.
import json
import os
import pandas as pd
import pandas_gbq
from google.oauth2 import service_account
# More about dotenv in the section `Configure dotenv`
from dotenv import find_dotenv, load_dotenv
load_dotenv(find_dotenv())
credentials = service_account.Credentials.from_service_account_info(
json.loads(os.getenv("GC_CREDENTIAL"))
)
pandas_gbq.context.credentials = credentials
pandas_gbq.context.project = "your-project-id"
df = pd.DataFrame({"example": ["value 1", "value 2", "value 3"]})
pandas_gbq.to_gbq(df, "dataset.table")
Configure dotenv
There are multiple ways to authenticate using the JSON key file. For example, you can add an environment variable GOOGLE_APPLICATION_CREDENTIALS
with the value set to the JSON key file path. It is not recommended to use this method because it is not secure to add the JSON key file to your project code and upload it directly to AskAnna.
In our example, we propose making the JSON file's content available via an environment variable. python-dotenv will help to make this setup and configuration smooth.
With python-dotenv you only have to add two lines of code to your Python file:
from dotenv import find_dotenv, load_dotenv
load_dotenv(find_dotenv())
These two lines make it possible to develop your Python code locally, while you can also run the same code in AskAnna when you use project variables. When you add project variables, these variables will become available as environment variables in the run environment.
Locally, you can add a file .env
and when you run the Python code locally, the environment variables are loaded from this file. Read more about this on the project page of python-dotenv.
To run the above example, you need a .env
file with:
GC_CREDENTIAL='
{
"type": "service_account",
...
}
'
Security info
Ensure that the credentials saved in the .env
and JSON key file are not uploaded to AskAnna. You can prevent this by adding the files to askannaignore
.
Add AskAnna project variable
To run the above examples as a job in AskAnna, you should add a project variable GC_CREDENTIAL
. On the project page, go to the tab variables. Here you can create a new variable. To run the examples, you should add a variable named:
- GC_CREDENTIAL
For the value of the variable GC_CREDENTIAL
you can copy-paste the content of the JSON key file.
Warning
Make sure that the variable GC_CREDENTIAL
is set to masked. You don't want to expose this value.