System Design Interview 2: News feed like Twitter

By Tomasz Kuczma

March 11, 2021

In previous article , I show you how to design a URL shortener service. I presented the basics of capacity planning, collecting requirements, DB sharding, etc. Now, I would like to cover another interesting system design interview question - How would you design Twitter?

Requirements and capacity planning

We want a system that allows a user to share short text messages (posts). Those messages will be displayed in 2 places:

on user board
on home board

User’s X board displays posts that user X published. Every logged-in user Y has also their own private home board. Home board displays posts of users subscribed by user Y.

You can ask many questions here to clarify e.g.

How users subscribe to each other. Is it mutual (bidirectional like Facebook friends) or not (Twitter follows).
Do we need to support videos and images in posts?

This time, we will use simple text messages and Twitter-like follow (unidirectional) mechanism and focus on building news feed mechanism (user and home boards) as this is the most interesting part. Note that, a similar mechanism could be used to build a Facebook wall.

Images and videos are not a big problem anyway. We could just store links to video/image files in the post and upload binary content to a separate service - CDN-based e.g. on S3.

As not to bore you with capacity math let’s just say that we want to have a highly available, global scale service (check previous article how to do that if you want).

Naive solution - technically those are just 2 SQL tables

In the SQL world, you would have 2 tables posts (storing messages) and subscriptions (storing who subscribes to whom). So user board is just:

select * from posts 
  where author = :user_id

and home board is:

select * from posts
  join subscriptions 
  on subscriptions.subscribed_user = posts.author
  where subscriptions.subscriber = :user_id

(of course both with some limit and order by statements). As you might notice, this approach would not scale much and can become very slow in global scale scenario (mostly because of join operation - check out why joins are “bad” in NoSQL mindset article ).

Better design

To support global scale and high availability use case, everything runs on multiple hosts (including database). If you don’t know how you could shard a DB or how global scale DB systems works - go back to URL shortener article and DynamoDB desing .

The trick in this design is to store both boards precomputed in some distributed memory store (Twitter uses Redis for that).

You can separate reads from writes using CQRS pattern so we will have 2 sub-services here - Feeds reader and publisher. When user X wants to publish a new post, the request will be forwarded to the Feeds publisher which will:

Store that post in Posts DB.
Update user X board.
Fetch all subscribers of user X from Subscribers DB.
Update each subscriber’s home board with this post in Feed memDB.
Publish “post published” event to the new feeds queue

When user X wants to read the home board or other user board, the request will be forwarded to the Feeds reader which will just read it from Feeds memDB - this is very fast! We can accept some delay on write but want to be as fast as possible on read.

Note that, new feeds queue is not needed right now, but this is a good way to separate the core of the system (news feed) from other services like trends, new content discovery, search, etc. Alternatively, other services could just query Feeds reader. It depends on the use case of course. Also, all the data is also stored in persistent DB (Posts DB) so in case of disaster, you should have some tool that will republish all posts to Feeds memDB.

You should also consider high availability here. Data stored in memory is volatile. To mitigate that, you should replicate each data on few nodes e.g. 3.

All of that (storing few copies of pre-computed data in memory) is expensive. That’s the cost of being fast. You should put a lot of attention into what data do you really need to store in memDB! Twitter just stores a list of less than 1k entries on every precomputed board and each entry is just 20 bytes long (tweet_id - 8 bytes, user_id - 8 bytes, internal_bit_field - 4 bytes).

Final word

We just learn how to design a news feed system using a precomputation approach. It is not just about Twitter. It is also used in Facebook and Instagram walls (where the photos are the news) or even in dating apps like Tinder. As a bonus, I leave a link to Raffi Krikorian (Twitter VP) talk how Twitter really works .

Article by Tomasz Kuczma

Software engineer with a passion. Interested in computer networks and large-scale distributed computing. He loves to optimize and simplify software on various levels of abstraction starting from memory ordering through non-blocking algorithms up to system design and end-user experience. Geek. Linux user.

The views I express are my alone and they do not necessarily express the views of my employer or ex-employers. They are not investment advice nor based on any non-public information of any kind. Poglądy, które wyrażam, są tylko moje i niekoniecznie wyrażają opinie mojego pracodawcy lub byłych pracodawców. Nie są poradami inwestycyjnymi ani nie opierają się na jakichkolwiek niepublicznych informacjach.

Requirements and capacity planning

Naive solution - technically those are just 2 SQL tables

Better design

Top problems

Final word