Notes on Cloud Firestore

… from watching Youtube videos made by the Firebase team. I didn’t actually use Cloud Firestore, so feel free to skip this post.

  1. Firestore is a replacement for the older Realtime Database.
  2. You store data in documents.
  3. A document is like a JSON object, with keys and values, including nested objects, like person.address.city.
  4. Firestore is dynamically typed, so fields that are not appropriate for a document can be skipped. For example, if a restaurant doesn’t serve alcohol, you don’t need to have a field serves_alcohol: false, since it serves no purpose.
  5. Store data denormalised, as is the norm for NoSQL databases. The Firestore team made this decision because reads vastly outnumber writes, so you want reads to be fast by not having to join. Instead, do the joins and store the result in the database.
  6. A document can be max 1 MB size, and it’s retrieved in its entirety. If you don’t want that, split it into multiple documents. Example: in a restaurant app, you can store a restaurant and all its reviews and menus and so on in one document for each restaurant. But then, if the user sees a list of 30 restaurants, that’s 30 MB of data, which is too much, considering that the user is not going to read reviews in a restaurant list. They’re going to read the reviews when they select one restaurant and go to the restaurant detail page. So you don’t want to put all reviews as part of the restaurant document. Map each screen of your app to one document.
  7. Each document is limited to 1 QPS write. Sometimes developers run into this limit because they’ve combined multiple documents edited by different people into one document.
  8. Documents reside inside collections. For example, each restaurant might be modeled as a document, and all of them can reside inside a restaurants collection.
  9. Documents and collections must be nested alternatively: collection/document/collection/document… Can’t have a document directly within a document or a collection directly within a collection. e.g., A collection representing all restaurants, which contains documents, each of which represents one restaurant, which contains a collection representing all reviews for that restaurant, each of which represents one review.

Client vs Server

  1. Firestore is a database whose data is synced down to mobile and web. Unlike traditional databases like MySQL, which are accessed only from the backend. You don’t need to build a CRUD API to expose the database to frontends running on users’ devices, because Firestore already comes with one. In addition, it offers a client SDK with caching, offline, retries, exponential backoff and conflict resolution.
  2. Firestore has security rules that control who can read, create, update or delete something. You use these to implement the semantics appropriate to your app. For example, if we’re building an app like Google Docs where I have two docs, A and B, where A private and B is shared with Ramesh, this would be implemented by configuring the security rules to say “Every doc has two fields: an owner and a share list. A user can access a doc if his user ID appears in at least one of them.”
  3. Security rules don’t apply to server-side code like Cloud Functions, which are trusted.
  4. You need to think about what is done on the client side vs the server side. For example, in a restaurant app, when a user rates a restaurant, you need to recalculate the restaurant’s rating by averaging each user’s rating. You can do this server-side, since you don’t want to let users edit the restaurant object. Even if you did make the restaurant object writable to clients, you have no way of verifying that the average has been calculated correctly. You don’t want fraud where a restaurant owner sets his restaurant’s rating to 5 stars or his competitor’s to 1 star. So, don’t let clients edit the restaurant object. Since Firestore is accessible both on the client- and server-side, you can have different security policies for both.
  5. There are three ways for the client to invoke logic on the server: the first is to expose the Cloud Function via a HTTP endpoint and invoke it, which is the way traditional backends have been built for years.
  6. The second way is to define a callable function, which is a client-side wrapper over over the server function. Then the client library takes care of networking, and sends authorisation information to the server, so the cloud function has access to the user ID without having to validate tokens or cookies.
  7. Now let’s understand the third way to invoke server-side logic. Let’s imagine a restaurant app that screens reviews for offensive language. If you were to make the reviews collection appendable, you won’t be able to screen them. An alternative is to keep the reviews collection read-only and create a separate collection called pending_reviews which is appendable. When the user posts a review, it’s appended to pending_reviews, and Firebase syncs this to the server, at which point you arrange for a Cloud Function to be run whenever this collection changes. Then you check the review for offensive language, and if it’s fine, append to the reviews collection on the server-side.

Queries

  1. You run queries against one or more collections: select restaurants where…
  2. Queries can include nested fields, like where person.address.city = “Bangalore”.
  3. Queries are shallow — when you run a query against a document, it doesn’t show results from documents nested underneath it.
  4. A query takes runtime proportional to the result size, not the database size. This is different from SQL, where you could have a query that runs for an hour and produces one number as the result. A Firestore query that produces a little data will always be fast. Firestore guarantees this by forcing all queries to use indices. Unlike SQL, Firestore won’t do a table scan if an index that works for the given query isn’t present. It will give an error. The error message will give you a link to create an index with the parameters filled in. Just copy paste that URL into your browser to create the index.
  5. Firestore also looks up the index only once for a given query. So you can’t have a query like Italian OR Mexican restaurant, since that will require looking up the index once for Italian and once for Mexican restaurants. Firebase will look up the index only once. You have to do two queries and combine them in your application code.
  6. Or you can define a composite index for all European restaurants, but this has to be planned ahead of time. You can’t have a UI where the user can select any combination of cuisines they want dynamically by checking and unchecking boxes.
  7. You can do both sort and filter like price < 1000 ORDER by price. But the filter and sort should use the same parameter. You can’t do price < 1000 ORDER BY rating. All these limitations are consequences of the “look up the index only one” algorithm.
  8. Pricing is per read or write. A million reads cost 60 cents. This is serverless. Unlike (say) RDS, where a 4GB instance costs a certain amount of money and you select an instance size and pay for its capacity, and you get whatever number of reads it can handle.
  9. You can use a listener, which notifies you whenever a document changes. This lets you build a live updating UI, for example.
  10. The alternative to a listener is the fetch API, which is one-time. It invokes its callback only once. This prevents optimisations that listeners can do, like first delivering cached data so you show something in the UI immediately, and in the background contacting the server and then invoking the callback again to update the UI.
  11. Think about which UX would be appropriate for your use case: traders would want stock prices update live. On the other hand, if you’re building a Reddit app, you don’t want the number of upvotes and downvotes of a post live updating as you’re trying to read it — that would be distracting and annoying, not helpful. So first figure out which UX you want to build, and then use the appropriate API to enable that UX.

--

--

Tech advisor to CXOs. I contributed to a multi-million dollar outcome for a client. ex-Google, ex-founder, ex-CTO.