Let's Work Together

Synchronize Firebase to BigQuery

Image by Troy Dugger

Synchronize mobile app events into BigQuery

Firestore is perfect for your mobile and web apps, but you need the power of BigQuery to analyze and audit smaller amounts of frequently pushed data that isn’t consumed by the client applications. For instance, say you wanted to count how many times a customer taps a button in your mobile app or what features a user is using so you can provide better customer support by understanding their actions and engagement.

In this blog post, we will walk through the synchronization of Firestore data into BigQuery. We will implement a Clould Function that is triggered by document create events and publishes the document data to a Pub/Sub topic. Pub/Sub will allow for future expansion and a recently added feature of Pub/Sub, called BigQuery Subscriptions, makes it simple to get the data into BigQuery.

Here are the Google Cloud features we will be covering and the process flow:

Firestore ➔ Cloud Functions ➔ Pub/Sub ➔ BigQuery Subscriptions ➔ BigQuery

Part One: Firestore and Cloud Function

Let’s break this blog up into two parts. First, we will verify the Cloud Function is being triggered by Firestore document create events and getting the data. Second, we will set up Pub/Sub and BigQuery and change the Cloud Function to publish the data.

Enable and set up Firestore

In your Google Cloud project (GCP), enable Firestore, choose native mode, set the region you prefer and create a collection named messages.

For our prototyping here, set the Security Rules to the following:

1
2
3
4
5
6
7
8
rules_version = '2';
service cloud.firestore {
 match /databases/{database}/documents {
   match /{document=**} {
     allow read, write: if true;
   }
 }
}

Note: We are omitting Firestore Security Rules for this article. This is not something you should do in your production systems! See the security rules documentation for how to set these up for your project.

Create a Cloud Function with the basics

In GCP, enable Cloud Functions and create a function with the following properties:

Note: We are using 1st generation Cloud Functions because 2nd generation uses EventArc and as we reported earlier in a medium post, EventArc doesn’t work well with Firestore events.

For the Cloud Function code, choose the following settings:

1
2
3
4
5
6
7
const { Logging } = require('@google-cloud/logging');
const logging = new Logging();
const log = logging.log('publishToBigQuery');

exports.publishToBigQuery = event => {
   log.info(log.entry(JSON.stringify(event, null, 2)));
};

Copy the following code into package.json:

1
2
3
4
5
6
7
{
 "name": "sample",
 "version": "1.0.0",
 "dependencies": {
   "@google-cloud/logging": "^10.1.9"
 }
}

Create some data in Firestore

In Firestore, select the messages collection and add a document with the following properties:

See output from Cloud Function in Logs Explorer

In GCP, go to Logging and click the Stream logs button.

After some time, you should see the Firestore write event come across and see the event JSON output from the Cloud Function similar to the following:

To filter out some noise and just get the logging output from the Cloud Function, click the Log Name button, check publishToBigQuery and click Apply.

You can also use the Resource button to filter the logs, if you’d prefer, but creating our own log has some value.

Expanding the log entry from the Cloud Function, you should see the following:

Part Two: Publishing data to BigQuery

Now that we have the plumbing in place to trigger the Cloud Function from Firestore, we will create a Pub/Sub topic, make a BigQuery Subscription and modify the Cloud Function to publish to the topic.

Enable and set up BigQuery

In GCP, go to BigQuery and create a Dataset with the following properties:

Then create a table by selecting the dataset and choosing Create table from the three-dot menu and use the following properties:

Add BigQuery Data Editor role to Pub/Sub service account

In order for Pub/Sub BigQuery Subscriptions to create data in BigQuery written to a Pub/Sub topic, we have to give the Pub/Sub service account some permissions. To do this, perform the following steps:

  1. In GCP, go to IAM & Admin.
  2. Click checkbox Include Google-provided role grants on the top-right above the table.
  3. Click on pencil icon for the Cloud Pub/Sub Service Account.
  4. Click on ADD ANOTHER ROLE button.
  5. Type BigQuery in the Filter field.
  6. Choose BigQuery Data Editor from the list.
  7. Click Save.

Create Pub/Sub topic

In GCP, go to Pub/Sub and create a topic with the following properties:

Create BigQuery Subscription

After the Topic is created, click on the Export to BigQuery button, keep the defaults, click CONTINUE and use the following properties:

Leave everything else as defaults.

Update the Cloud Function code to write to Pub/Sub topic

In GCP, go to Cloud Functions, click on copy-data-to-bigquery in the list and click the EDIT toolbar button.

Click Next and copy the following code into index.js:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
const { Logging } = require('@google-cloud/logging');
const logging = new Logging();
const log = logging.log('publishToBigQuery');

const { PubSub } = require('@google-cloud/pubsub');
const pubsub = new PubSub();

exports.publishToBigQuery = async event => {
   try {
       const topic = pubsub.topic('firestore-to-BQ-data-field');
       const data = Buffer.from(JSON.stringify(event.value));
       const value = await topic.publishMessage({data});
       log.info(log.entry(`Document published on pubsub ${value}`));
   } catch (error) {
       log.error(log.entry(`Error when publishing document: ${error}`));
   };
};

Copy the following code into package.json:

1
2
3
4
5
6
7
8
{
 "name": "sample",
 "version": "1.0.0",
 "dependencies": {
   "@google-cloud/logging": "^10.1.9",
   "@google-cloud/pubsub": "^3.1.0"
 }
}

Click DEPLOY.

Create document in Firestore

Now, it’s time to test out the rest of the plumbing by creating a document in Firestore and see the value of that document get placed into BigQuery.

In GCP, go to Firestore, click the three-dot menu next to the test document in the messages collection and choose Add similar document.

Click Save, then in GCP, go to Logging > Logs Explorer and wait til you see log entries from the Cloud Function as well as BigQueryWrite.AppendRows entries similar to the following:

See the results in BigQuery

Once the log entries appear, go to BigQuery and do the following:

In the Query results section that appears, you should see the data you entered into Firestore.

Summary and next steps

While it might seem silly to have the raw JSON data from the Firestore event in a data column in BigQuery, it has added benefit of not requiring future updates when consuming data. Lee Doolan’s medium post covers the pros and cons of using topic schemas, so check that out if you want to learn more.

However, we’ve wrote another blog post that will walk through how to populate specific table columns based on the JSON data.

Thanks for reading and I hope this blog has helped you. Keep us in mind for all your mobile application needs.

About Atomic Robot

Atomic Robot brings together the best developers and creators to deliver custom-crafted mobile solutions for iOS, Android, and emerging technologies.

Please reach out to us at [email protected] or go to https://atomicrobot.com/contact/.