Traverse and migrate your Firestore collections efficiently

Anar Kafkas
4 min readJun 26, 2021

--

TL;DR

Use Firewalk’s configurable traverser objects to traverse your Firestore collections in a simple, intuitive, and memory-efficient way using batching. Firewalk is a light, robust, well-typed, and well-documented Node.js library. You can use it for a variety of use cases like database migration scripts (e.g. when you need to add a new field to all docs) or a scheduled Cloud Function that needs to check every doc in a collection periodically or a locally run script that retrieves some data from a collection.

If you’re using Firestore for the production database for your project, there will eventually come a time when you need to make some changes as an admin to a particular collection. If you don’t already have the id of every document in that collection, which is likely the case, you first need to retrieve them before writing to them. Doing so is not a problem if you have only a few hundred documents in the collection. You can just retrieve them with a single call to collectionRef.get() and write to every document retrieved from that call.

But things get more complicated when you have tens of thousands or millions of documents in the collection. You can’t just get all of them at once as your program’s memory usage will explode. You need to implement a different traversal logic that will go through the entire collection by batches. You also need to ensure that you don’t miss any documents or process any of them multiple times.

Solution: Firewalk

Firewalk is an open-source Node.js library that solves precisely this problem. It has an easy-to-use, intuitive API and provides you with configurable traverser objects that walk you through a given collection. We are already using it in production in several projects including Proficient AI and finding it extremely useful!

Installation

npm install firewalk

Firewalk is designed to work with the Firebase Admin SDK so if you haven’t already installed it, run npm install firebase-admin .

Core Concepts

There are only 2 kinds of objects you need to be familiar with when using this library:

  1. Traverser: An object that walks you through a collection of documents (or more generally a Traversable).
  2. Migrator: A convenience object used for database migrations. It lets you easily write to the documents within a given traversable and uses a traverser to do that. You can easily write your own migration logic in the traverser callback if you don’t want to use a migrator.

Quick Start

Suppose we have a users collection and we want to send an email to each user. This is how easy it is to do that efficiently with a Firewalk traverser:

We are doing 3 things here:

  1. Create a reference to the users collection
  2. Pass that reference to the createTraverser() function
  3. Invoke .traverse() with an async callback that is called for each batch of document snapshots

This pretty much sums up the core functionality of this library! The .traverse() method returns a Promise that resolves when the entire traversal finishes, which can take a while if you have millions of docs. The Promise resolves with an object containing the traversal details e.g. the number of docs you touched.

Using a fast traverser

One powerful feature of Firewalk is that it allows you to configure your traverser, which you can use with different types of migrators. If you want to trade some memory for speed, you can increase concurrency by adjusting maxConcurrentBatchCount. The traverser will be processing multiple batches concurrently if maxConcurrentBatchCount > 1.

Traversal Complexity

Here are the time and space complexities as well billing info for a traversal. Note that batchSize and maxConcurrentBatchCount come from the traversal config that you specify.

  • Time complexity: O((N / batchSize) * (Q(batchSize) + C(batchSize) / maxConcurrentBatchCount))
  • Space complexity: O(maxConcurrentBatchCount * (batchSize * D + S))
  • Billing: max(1, N) reads

where:

  • N: number of docs in the traversable
  • Q(batchSize): average batch query time
  • C(batchSize): average callback processing time
  • D: average document size
  • S: average extra space used by the callback

More Examples

Here are a few more examples to help you understand how you interact with migrators and traversers.

Add a new field using a migrator

Add a new field derived from the previous fields

Use a fast migrator

Change traversal config

Rename a field

Conclusion

Firewalk is a light and powerful Node.js library that provides you with fast, efficient, and configurable traverser objects, which you can use to read and write to your Firestore collections. You can always find the most up-to-date docs on our GitHub repo at https://github.com/kafkas/firewalk and if you have any feature requests feel free to create an issue. Contributions are welcome and appreciated!

--

--

Anar Kafkas
Anar Kafkas

Written by Anar Kafkas

Technologist, building Proficient AI

No responses yet