architect-handbook

Software Architect Handbook

View on GitHub

Design a News Feed System

Overview

Facebook: “News feed is the constantly updating list of stories in the middle of your home page. News Feed includes status updates, photos, videos, links, app activity, and likes from people, pages, and groups that you follow on Facebook.”

Step 1: Understand the problem and establish design scope

Candidate: Is this a mobile app? Or a web app? Or both? Interviewer: Both.

Candidate: What are the important features? Interviewer: A user can publish a post and see her friends’ posts on the news feed page.

Candidate: Is the news feed sorted by reverse chronological order or any particular order such as topic scores? For instance, posts from your close friends have higher scores. Interviewer: To keep things simple, let us assume the feed is sorted by reverse chronological order.

Candidate: How many friends can a user have? Interviewer: 5000

Candidate: What is the traffic volume? Interview: 10 million DAU

Candidate: Can feed contain images, videos, or just text? Interviewer: It can contain media files, including both images and videos.

Step 2: Propose high-level design and get buy-in

The design is divided into two flos: feed publishing and news feed building.

Newsfeed APIs

The news feed APIs are the primary ways for clients to communicate with servers. Those APIs are HTTP based and allow clients to perform actions, which include posting a status, retrieving news feed, adding friends, etc.

Feed publishing API

To publish a post, a HTTP POST request will be sent to the server: POST /v1/me/feed.

Params:

Newsfeed retrieval API

The API to retrieve news feed is GET /v1/me/feed and receives as param auth_token for to authenticate API requests.

Feed publishing flow

Newsfeed building

Step 3: Design deep dive

Feed publishing deep dive

We will focus on two components: web servers and fanout service.

Web servers

Besides communicating with clients, web servers enforce authentication and rate-limiting. Only users signed in with valid auth_token are allowed to make posts. The system limits the number of posts a user can make within a certain period, vital to prevent spam and abusive content.

Fanout service

Fanout is the process of delivering a post to all friends. Two types of fanout models are: fanout on write (also called push model) and fanout on read (also called pull model). Both models have pros and cons.

We adopt a hybrid approach: Since fetching the news feed fast is crucial, we use a push model for the majority of users. For celebrities or users who have many friends/followers, we let followers pull news content on-demand to avoid system overload. Consistent hashing is a useful technique to mitigate the hoyket problem as it helps to distribute requests/data more evenly.

The fanout service works as follows:

  1. Fetch friend IDs from the graph database. Graph databases are suited for managing friend relationship and friend recommendations.

  2. Get friends info from the user cache. The system then filters out friends based on user settings. For example, if you mute someone, his posts will not show up on your news feed even though you are still friends. Also, a user could selectively share information with specific friends or hide it from other people.

  3. Send friends list and new post ID to the message queue.

  4. Fanout workers fetch data from the message queue and store news feed data in the news feed cache (<post_id, user_id> mapping table). To keep the memory size small, we set a configurable limit. The chance of a user scrolling through thousands of posts in news feed is slim, so the cache miss rate is low.

  5. Write <post_id, user_id> to cache.

Newsfeed retrieval deep dive

Media content are stored in CDN for fast retrieval.

  1. A user sends a request to retrieve his news feed. The request looks like GET /v1/me/feed.

  2. The load balancer redistributes requests to web servers.

  3. Web servers call the news feed service to fetch news feed.

  4. News feed service gets a list of post IDs from the news feed cache.

  5. A user’s news feed also contains username, profile picture, post content, post image, etc. Thus, the news feed service fetches the complete user and post objects from caches (user cache and post cache) to construct the fully hydrated news feed.

  6. The fully hydrated news feed is returned in JSON format back to the client for rendering.

Cache architecture

Cache is extremely important for a news feed system. We device the cache tier into 5 layers:

Step 4: Wrap up

To avoid duplicated discussion, only high-level talking points are listed below: