Google Cloud Storage
The Google Cloud Storage integration shows how to set up a connection to Google Cloud Storage using Python. We demonstrate how to read objects from a bucket and vice versa. Before your Python script can interact with Google Cloud Storage, we first need to get a Google service account key file from a service account that has access to Google Cloud Storage.
Get Google service account key file
To set up the Google Cloud Storage connection on a remote system like AskAnna, you must have a Google service account with permission to access the bucket. You can create a new service account via this link if you don't have one. To authenticate, you need to have the associated private JSON key of the service account or create a new service account JSON key.
With the service account, you can get the JSON key via the following steps:
- Click the email address of the service account that you created
- Click the
KEYS
tab - Click the
ADD KEY
drop-down menu, then selectCreate new key
. - Select
JSON
as the Key type and click Create. - This creates and downloads a JSON file that you can use to set up the connection
For more information on service accounts, see the Getting started with authentication on Google Cloud Platform docs.
Python
In the example below, we show how to set up a connection to a Google Cloud Storage source using Python. We demonstrate how to read objects from a bucket and vice versa. To do this, we need Python 3.7 or newer and install the following packages:
We recommend using a virtual environment and adding the packages to a requirements.txt file. In this file, you can add the following:
google-cloud-storage # tested with version 2.7.0
python-dotenv # tested with version 1.0.0
Download from Google Cloud Storage
In the following example code, we set up a connection to Google Cloud Storage. For more information check the Google Cloud Storage documentation.
import json
import os
from google.cloud import storage
from google.oauth2 import service_account
# More about dotenv in the section `Configure dotenv`
from dotenv import find_dotenv, load_dotenv
load_dotenv(find_dotenv())
bucket_name = "BUCKET NAME" # Bucket to download from
source_object_name = "OBJECT NAME" # Google Cloud Storage object name
target_file_name = "PATH TO FILE" # File to save the object to
credentials = service_account.Credentials.from_service_account_info(
json.loads(os.getenv("GC_CREDENTIAL"))
)
storage_client = storage.Client(credentials=credentials)
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_object_name)
blob.download_to_filename(target_file_name)
Uploading to Google Cloud Storage
We reuse almost to full setup as with downloading objects from Google Cloud Storage. For more information about uploading objects check the Google Cloud Storage documentation.
import json
import os
from google.cloud import storage
from google.oauth2 import service_account
# More about dotenv in the section `Configure dotenv`
from dotenv import find_dotenv, load_dotenv
load_dotenv(find_dotenv())
bucket_name = "BUCKET NAME" # Bucket to upload to
target_object_name = "OBJECT NAME" # Google Cloud Storage object name
source_file_name = "PATH FILE" # File to upload
credentials = service_account.Credentials.from_service_account_info(
json.loads(os.getenv("GC_CREDENTIAL"))
)
storage_client = storage.Client(credentials=credentials)
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(target_object_name)
blob.upload_from_filename(source_file_name)
Configure dotenv
There are multiple ways to authenticate using the JSON key file. For example, you can add an environment variable GOOGLE_APPLICATION_CREDENTIALS
with the value set to the JSON key file path. It is not recommended to use this method because it is not secure to add the JSON key file to your project code and upload it directly to AskAnna.
In our example, we propose making the JSON file's content available via an environment variable. python-dotenv will help to make this setup and configuration smooth.
With python-dotenv you only have to add two lines of code to your Python file:
from dotenv import find_dotenv, load_dotenv
load_dotenv(find_dotenv())
These two lines make it possible to develop your Python code locally, while you can also run the same code in AskAnna when you use project variables. When you add project variables, these variables will become available as environment variables in the run environment.
Locally, you can add a file .env
and when you run the Python code locally, the environment variables are loaded from this file. Read more about this on the project page of python-dotenv.
To run the above example, you need a .env
file with:
GC_CREDENTIAL='
{
"type": "service_account",
...
}
'
Security info
Ensure that the credentials saved in the .env
and JSON key file are not uploaded to AskAnna. You can prevent this by adding the files to askannaignore
.
Add AskAnna project variable
To run the above examples as a job in AskAnna, you should add a project variable GC_CREDENTIAL
. On the project page, go to the tab variables. Here you can create a new variable. To run the examples, you should add a variable named:
- GC_CREDENTIAL
For the value of the variable GC_CREDENTIAL
you can copy-paste the content of the JSON key file.
Warning
Make sure that the variable GC_CREDENTIAL
is set to masked. You don't want to expose this value.