How to structure a feed and follow system?

I was using Firebase realtime database for my social network app were you can follow and receive posts of people you follow.

My database :

Users
--USER_ID_1
----name
----email
--USER_ID_2
----name
----email


Posts
--POST_ID_1
----image
----userid
----date
--POST_ID_2
----image
----userid
----date


Timeline
--User_ID_1
----POST_ID_2
------date
----POST_ID_1
------date

Another node "Content" contained id of the all the user posts. If "A" followed "B" then all post id's of B where added to A's timeline. And if B posted something it's also added to all of its follower's timelines.

It has scalability issues :

  • If someone has 10,000 followers a new post was added to all 10,000 follower's timelines.
  • If someone has a large amount of posts then every new follower receives all of those in his timeline.

I want to change to Firestore as it's been claimed scalable. How should I structure my database so these problems in realtime database are eliminated in Firestore?

24987 次浏览

I went through some of the the Firebase documentation, and I'm confused as to why the suggested implementation at https://firebase.google.com/docs/database/android/structure-data#fanout wouldn't work in your case. Something like this:

users
--userid(somedude)
---name
---etc
---leaders:
----someotherdude
----someotherotherdude


leaders:
--userid(someotherdude)
---datelastupdated
---followers
----somedude
----thatotherdude
---posts
----postid


posts
--postid
---date
---image
---contentid


postcontent
--contentid
---content

The guide goes on to mention "This is a necessary redundancy for two-way relationships. It allows you to quickly and efficiently fetch Ada's memberships, even when the list of users or groups scales into the millions.", so it doesn't seem that scalability is exclusively a Firestore thing.

Unless I'm missing something the main problem seems to be the existence of the timeline node itself. I get that it makes it easier to generate a view of a particular user's timeline, but that comes at the cost of having to maintain all of those relationships and is significantly delaying your project. Is it too inefficient to use queries to build a timeline on the fly from a structure similar to the above, based on a submitted user?

I've seen your question a little later but I will also try to provide you the best database structure I can think of. So hope you'll find this answer useful.

I'm thinking of a schema that has there three top-level collections for users, users that a user is following and posts:

Firestore-root
|
--- users (collection)
|     |
|     --- uid (documents)
|          |
|          --- name: "User Name"
|          |
|          --- email: "email@email.com"
|
--- following (collection)
|      |
|      --- uid (document)
|           |
|           --- userFollowing (collection)
|                 |
|                 --- uid (documents)
|                 |
|                 --- uid (documents)
|
--- posts (collection)
|
--- uid (documents)
|
--- userPosts (collection)
|
--- postId (documents)
|     |
|     --- title: "Post Title"
|     |
|     --- date: September 03, 2018 at 6:16:58 PM UTC+3
|
--- postId (documents)
|
--- title: "Post Title"
|
--- date: September 03, 2018 at 6:16:58 PM UTC+3

if someone have 10,000 followers than a new post was added to all of the 10,000 follower's Timeline.

That will be no problem at all because this is the reason the collections are ment in Firestore. According to the official documentation of modeling a Cloud Firestore database:

Cloud Firestore is optimized for storing large collections of small documents.

This is the reason I have added userFollowing as a collection and not as a simple object/map that can hold other objects. Remember, the maximum size of a document according to the official documentation regarding limits and quota is 1 MiB (1,048,576 bytes). In the case of collection, there is no limitation regarding the number of documents beneath a collection. In fact, for this kind of structure is Firestore optimized for.

So having those 10,000 followers in this manner, will work perfectly fine. Furthermore, you can query the database in such a manner that will be no need to copy anything anywhere.

As you can see, the database is pretty much denormalized allowing you to query it very simple. Let's take some example but before let's create a connection to the database and get the uid of the user using the following lines of code:

FirebaseFirestore rootRef = FirebaseFirestore.getInstance();
String uid = FirebaseAuth.getInstance().getCurrentUser().getUid();

If you want to query the database to get all the users a user is following, you can use a get() call on the following reference:

CollectionReference userFollowingRef = rootRef.collection("following/" + uid + "/userFollowing");

So in this way, you can get all user objects a user is following. Having their uid's you can simply get all their posts.

Let's say you want to get on your timeline the latest three posts of every user. The key for solving this problem, when using very large data sets is to load the data in smaller chunks. I have explained in my answer from this post a recommended way in which you can paginate queries by combining query cursors with the limit() method. I also recommend you take a look at this video for a better understanding. So to get the latest three posts of every user, you should consider using this solution. So first you need to get the first 15 user objects that you are following and then based on their uid, to get their latest three posts. To get the latest three posts of a single user, please use the following query:

Query query = rootRef.collection("posts/" + uid + "/userPosts").orderBy("date", Query.Direction.DESCENDING)).limit(3);

As you are scrolling down, load other 15 user objects and get their latest three posts and so on. Beside the date you can also add other properties to your post object, like the number of likes, comments, shares and so on.

If someone have large amount of posts than every new follower received all of those posts in his Timeline.

No way. There is no need to do something like this. I have already explained above why.

Edit May 20, 2019:

Another solution to optimize the operation in which the user should see all the recent posts of everyone he follow, is to store the posts that the user should see in a document for that user.

So if we take an example, let's say facebook, you'll need to have a document containing the facebook feed for each user. However, if there is too much data that a single document can hold (1 Mib), you need to put that data in a collection, as explained above.

There have two situations

  1. Users in your app have a small number of followers.

  2. Users in your app have a large number of followers. If we are going to store whole followers in a single array in a single document in firestore. Then it will hit the firestore limit of 1 MiB per document.


  1. In the first situation, each user must keep a document which stores the followers' list in a single document in a single array. By using arrayUnion() and arrayRemove() it is possible to efficiently manage followers list. And when you are going to post something in your timeline you must add the list of followers in post document.

    And use query given below to fetch posts

    postCollectionRef.whereArrayContains("followers", userUid).orderBy("date");
    
  2. In the second situation, you just need to break user following document based on the size or count of followers array. After reaching the size of the array into a fixed size the next follower's id must add into the next document. And the first document must keep the field "hasNext", which stores a boolean value. When adding a new post you must duplicate post document and each document consist of followers list that breaks earlier. And we can make the same query which is given above to fetch documents.

I've been struggling bit with the suggested solutions her, mostly due to a technical gap, so i figured another solution that works for me.

For every user I have a document with all the accounts that they follow, but also all a list of all the accounts that follow that user.

When the app starts, I get a hold of the list of accounts that follow this current user, and when a user makes a post, part of the post object is the array of all the users that follow them.

When user B wants too get all the posts of the people they are following, i just ad to the query a simple whereArrayContains("followers", currentUser.uid).

I like this approach because it still allows me to order the results by any other parameters I want.

Based on:

  • 1mb per document, which by a google search I've made seems to hold 1,048,576 chaarecters.
  • The fact that Firestore generated UIDs seem to be around 28 characters long.
  • The rest of the info in the object doesn't take too much size.

This approach should work for users that have up to approx 37,000 followers.

I think one possibility is to make another top-level collection named "users_following" which contains a document named "user_id" and a field for an array that contains all the users that the user is following. Within that "users_following" document one can have sub-collection of that particular user all posts or a top-level collection will also do the job. The next important thing that comes is that one has to store a recent one post inside "users-following" document as an array or map. Basically this normalized data is going to be used to populate the feed of the person who is following you. But its drawback is that you will only see one post per person even if the person has added two posts recently or even if you store your two to three posts in the normalized way than your all three posts will be shown at once (like three posts of the same user in a row). But it's something still good if you just need to show one post per user.

The other answers are going to get very costly if you have any decent amount of activity on your network (e.g. People following 1,000 people, or people making 1,000 posts).

My solution is to add a field to every user document called 'recentPosts', this field will be an array.

Now, whenever a post is made, have a cloud function which detects onWrite(), and updates that poster's recentPosts array on their userDocument to have info about that post added.

So, you might add the following map to the front of the recentPosts array:

{
"postId": xxxxxxxxxxx,
"createdAt": tttttt
}

Limit the recentPosts array to 1,000 objects, deleting the oldest entry when going over limit.

Now, suppose you are following 1,000 users and want to populate your feed... Grab all 1,000 user documents. This will count as 1k reads.

Once you have the 1,000 documents, each document will have an array of recentPosts. Merge all of those arrays on client into one master array and sort by createdAt.

Now you have up to potentially 1 million post's docIDs, all sorted chronologically, for only 1,000 reads. Now as your user scrolls their feed simply query those documents by their docID as needed, presumably 10 at a time or something.

You can now load a feed of X posts from Y followees for Y + X reads.

So 2,000 posts from 100 followees would only be 2,100 reads.
So 1,000 posts from 1,000 followees would only be 2,000 reads.
etc...


Edit 1) further optimization. When loading the userDocuments you can batch them 10 at a time by using the in query ... normally this would make no difference because it's still 10 reads even though it's batched... but you can also filter by a field like recentPostsLastUpdatedAt and check that it's greater than your cached value for that user doc, then any user docs that haven't updated their recentPosts array will not get read. This can save you theoretically 10x on base reads.

Edit 2) You can attach listeners to each userDocument too to get new posts as their recentPosts change without querying every single follower each time you need to refresh your feed. (Although 1,000+ snapshot listeners could be bad practice, I don't know how they work under the hood) (Edit3: Firebase limits a project to only 1k listeners so edit2 wasn't a scalable optimization)

UPDATE: 8/28/21

I created a theoretical scalable solution. See here.

And some other options here.


My scalable idea is that users may have 1,000,000+ followers, but a REAL user does not follow more than 1000 people. We could simply aggregate their feed (a collection of posts). Here is my theory:

Collections

/users
/users/{userId}/follows
/users/{userId}/feed
/posts

1. Populate the feed

Populate feed needs to run first, and should honestly be in a cloud function. To avoid costs, it will only get new posts to your feed, but not posts older than 10 days (or however old).

populateFeed() - something like this...

numFollowing = get('users/numFollowing');
lastUpdate = get('users/lastUpdate');
tenDaysOld = timestamp 10 days ago


// maybe chunk at 20 here...
for (numFollowing) {
docs = db.collection('posts')
.where('userId', '==', userId)
.where('createdAt', '>', lastUpdate)
.where('createdAt', '<', tenDaysOld);
db.collection('users/${userId}/feed').batch.set(docs);

Update users/${userId}/lastUpdate to current timestamp...

This way, you don't get too many documents (only 10 days old for example), and you don't waste reads on docs you already have.

2) Read the feed

A feed will be the aggregated posts.

loadFeed() - call this after populateFeed()

db.collection('/users/${userId}/feed').orderBy('createdAt');

The documents in feed only really need the createdAt date and postId since you can pull the post on the front end, although you could store all data if you don't expect it to change:

postId: {
createdAt: date
}

Your userDoc will also have:

{
numFollowing: number,
lastUpdate: date
}

The app should automatically call loadFeed() on load. There could be a button that runs populateFeed() as a callable cloud function (the best), or locally. If your feed is a firebase observable, it will update automatically as they populate...

Just a thought... I think there might be some other cleaner ways to solve this problem that scale...

J

UPDATE

The more and more I think about it, I actually do think it is possible to update the field on a posts onWrite to all followers feed. The only constrain is time, which normally 60s, can be up to 9min. Really, you just need to make sure you bulk update asynchronously. See my adv-firestore-functions package here:

Alright after some thinking about this problem I came up with a theoretical solution (because I didn't test it yet). I will be using Cloud Firestore for this:

My Solution is compromised of two parts :

1. Database Shema design :

Firestore-root
|
_ _ users (collection):
|
_ _ uid (document):
|
_ _ name: 'Jack'
|
_ _ posts (sub-collection):
|
_ _ postId (document)
|
_ _ feed (sub-collection):
|
_ _ postId (document)
|
_ _ following (sub-collection):
|
_ _ userId (document)
|
_ _ followers (sub-collection):
|
_ _ userId (document)

1.1 Explanation:

As you can see here, I have created a collection named users representing each user in the database. Each uid document in the users collection has it's own fields like name for example and it's own sub-collections. Each uid document contains it's own created posts in the posts sub-collection, it contains the posts from the people the current user follows in the feed sub-collection. Finally it contains two sub-collections representing the following and followers.

2. Use Cloud Functions:

const functions = require("firebase-functions");


const firebaseAuth = require("firebase/auth");


const admin = require("firebase-admin");


admin.initializeApp();


const firestore = admin.firestore();


const uid = firebaseAuth.getAuth().currentUser.uid;


exports.addToUserFeed =
  

functions.firestore.document("/users/{uid}/posts/{postId}").onCreate(async
(snapshot,context) => {


const userId = context.params.uid;


const followers = await firestore.collection('users').doc(userId).collection('followers').where("userId", "==", uid).get();


const isFollowing = !followers.empty;


if (isFollowing == true) {


const docRef =
firestore.collection('users').doc(uid).collection('feed').doc();


const data = snapshot.data();


firestore.runTransaction(transaction => {
transaction.create(docRef, data);
});
}
});

2.1 Explanation:

Here we trigger a cloud function whenever a user creates a post in its sub-collection posts. Since we want to add the posts to the current users feed (feed sub-collection), from the users it is following, we check first whether the current user (which we got its id using firebase auth in form of uid constant) follows the created post author which its id is stored in the wildcard uid (We can access it through context.params.uid). The Checking is done through performing a Query to check if any of the userId documents in the followers sub-collection matches the current user id uid. This returns a feed0. Then we check if the feed0 is empty or not. If it is empty that means that the current user doesn't follow the context.params.uid user. Otherwise it does follow it. If it does follow, then we add the newly created post into the current users feed sub-collection using a transaction.

Alright thats it. I hope this helps anyone. Again I didn't test it yet, so maybe something can not work out, but hopefully it will. Thanks!