Traverse and migrate your Firestore collections efficiently
TL;DR
Use Firewalk’s configurable traverser objects to traverse your Firestore collections in a simple, intuitive, and memory-efficient way using batching. Firewalk is a light, robust, well-typed, and well-documented Node.js library. You can use it for a variety of use cases like database migration scripts (e.g. when you need to add a new field to all docs) or a scheduled Cloud Function that needs to check every doc in a collection periodically or a locally run script that retrieves some data from a collection.
If you’re using Firestore for the production database for your project, there will eventually come a time when you need to make some changes as an admin to a particular collection. If you don’t already have the id of every document in that collection, which is likely the case, you first need to retrieve them before writing to them. Doing so is not a problem if you have only a few hundred documents in the collection. You can just retrieve them with a single call to collectionRef.get()
and write to every document retrieved from that call.
But things get more complicated when you have tens of thousands or millions of documents in the collection. You can’t just get all of them at once as your program’s memory usage will explode. You need to implement a different traversal logic that will go through the entire collection by batches. You also need to ensure that you don’t miss any documents or process any of them multiple times.
Solution: Firewalk
Firewalk is an open-source Node.js library that solves precisely this problem. It has an easy-to-use, intuitive API and provides you with configurable traverser objects that walk you through a given collection. We are already using it in production in several projects including Proficient AI and finding it extremely useful!
Installation
npm install firewalk
Firewalk is designed to work with the Firebase Admin SDK so if you haven’t already installed it, run npm install firebase-admin
.
Core Concepts
There are only 2 kinds of objects you need to be familiar with when using this library:
- Traverser: An object that walks you through a collection of documents (or more generally a Traversable).
- Migrator: A convenience object used for database migrations. It lets you easily write to the documents within a given traversable and uses a traverser to do that. You can easily write your own migration logic in the traverser callback if you don’t want to use a migrator.
Quick Start
Suppose we have a users
collection and we want to send an email to each user. This is how easy it is to do that efficiently with a Firewalk traverser:
We are doing 3 things here:
- Create a reference to the
users
collection - Pass that reference to the
createTraverser()
function - Invoke
.traverse()
with an async callback that is called for each batch of document snapshots
This pretty much sums up the core functionality of this library! The .traverse()
method returns a Promise that resolves when the entire traversal finishes, which can take a while if you have millions of docs. The Promise resolves with an object containing the traversal details e.g. the number of docs you touched.
Using a fast traverser
One powerful feature of Firewalk is that it allows you to configure your traverser, which you can use with different types of migrators. If you want to trade some memory for speed, you can increase concurrency by adjusting maxConcurrentBatchCount
. The traverser will be processing multiple batches concurrently if maxConcurrentBatchCount
> 1.
Traversal Complexity
Here are the time and space complexities as well billing info for a traversal. Note that batchSize
and maxConcurrentBatchCount
come from the traversal config that you specify.
- Time complexity: O((N /
batchSize
) * (Q(batchSize
) + C(batchSize
) /maxConcurrentBatchCount
)) - Space complexity: O(
maxConcurrentBatchCount
* (batchSize
* D + S)) - Billing: max(1, N) reads
where:
- N: number of docs in the traversable
- Q(
batchSize
): average batch query time - C(
batchSize
): average callback processing time - D: average document size
- S: average extra space used by the callback
More Examples
Here are a few more examples to help you understand how you interact with migrators and traversers.
Add a new field using a migrator
Add a new field derived from the previous fields
Use a fast migrator
Change traversal config
Rename a field
Conclusion
Firewalk is a light and powerful Node.js library that provides you with fast, efficient, and configurable traverser objects, which you can use to read and write to your Firestore collections. You can always find the most up-to-date docs on our GitHub repo at https://github.com/kafkas/firewalk and if you have any feature requests feel free to create an issue. Contributions are welcome and appreciated!