Ace Your System Design Interview — Save 50% or more on Educative.io today! Claim Discount

Design a Messaging Platform Like WhatsApp: A Step-by-Step Guide

Designing a messaging app presents significant engineering hurdles. The premise appears simple on the surface. User A sends a text, and User B receives it. The underlying implementation involves a complex distributed system. You are engineering a system capable of handling over 100 billion messages daily. It must maintain sub-second latency and manage real-time presence for millions of users. In a System Design interview, the interviewer looks beyond building a basic chat app. They want to see if you can architect a global communication utility. This utility must remain available under heavy load.

Grokking System Design Interview: Patterns & Mock Interviews

A modern approach to grokking the System Design Interview. Master distributed systems & architecture patterns for System Design Interviews and beyond. Developed by FAANG engineers. Used by 100K+ devs.

Consider the high-level architecture required to support this traffic to visualize the scope.

High-level architecture of a scalable messaging platform

Step 1: Understand the problem statement

Define the system’s boundaries before discussing databases or protocols. Jumping straight into a solution without clarifying the scope is a mistake. Determine if you are building a simple MVP or a global platform like WhatsApp. Functional requirements usually include one-to-one messaging and group chats. Delivery acknowledgments and media sharing are also standard features. You must account for non-functional requirements. These are critical in this context. The system requires extremely low latency and high availability. Consistency across devices is essential. Security is a standard requirement. End-to-end encryption (E2EE) is the industry norm.

Perform a quick back-of-the-envelope estimation to ground your design. Assume 1 billion daily active users send 50 messages a day. This results in 50 billion messages daily. An average message of 100 bytes equals roughly 5TB of text data per day. Media storage requirements skyrocket into petabytes. This necessitates a robust data partitioning strategy.

Tip: Always ask about the read/write ratio and traffic spikes. Messaging apps are write-heavy compared to social feeds. Traffic often spikes during holidays or global events. This requires elastic scalability.

Identify the specific features that drive the user experience once the scope is defined.

Step 2: Define core features

A successful design prioritizes features that constitute the core loop. The primary requirement is one-to-one messaging. This involves reliable delivery and offline storage. Deduplication ensures messages are not processed twice. Group messaging follows closely. It introduces complexity regarding message fan-out and read receipts. Media sharing requires a separate pipeline. This handles heavy binary data without blocking real-time messaging channels. Presence indicators provide real-time feedback. Typing indicators make the app feel responsive.

Voice calls and multi-device synchronization are often treated as extensions. Keep them in mind early to avoid blocking architectural decisions. Designing a tight coupling between a phone number and a device ID creates issues. It makes implementing multi-device support significantly harder later.

Watch out: Do not over-engineer the MVP. Features like Stories or Payments are secondary. Focus on the core engineering challenge. The goal is delivering a message from point A to point B reliably.

We can begin sketching the system components with the features mapped out.

Step 3: High-level architecture

The architecture of a messaging platform relies heavily on persistent connections. The entry point is the client app. It handles local encryption and maintains a local database for chat history. Traffic flows through a load balancer to an API gateway. The gateway handles authentication and routing. The chat service manages connections and message routing. A presence service tracks user status. The media service connects to object storage and a CDN. A notification service wakes up devices that have terminated the background process.

The following diagram illustrates how these services interact to handle a standard message request.

component_interaction_diagram — Service interaction and database separation

Step 4: User management and authentication

WhatsApp-like platforms typically use phone numbers as the primary identity. The User Service manages user profiles. It maps phone numbers to internal User IDs. Authentication usually occurs via an SMS-based One-Time Password. The server issues a secure session token once verified. This token establishes the persistent WebSocket or TCP connection. The system must also manage public keys for encryption. These are stored alongside the user profile. Other users fetch them to initiate secure chats.

Note: WhatsApp uses a custom variation of the XMPP protocol. XMPP handles user addressing natively. Modern implementations often strip it down to essentials. This minimizes bandwidth usage on mobile networks.

We need to determine how messages travel between users now that they are authenticated.

Step 5: Message flow and delivery semantics

The choice of protocol is a critical decision. Standard HTTP is inefficient for real-time chat. It requires polling or long-polling. This drains battery and increases latency. We use persistent connections instead. WebSockets are a common choice for web-based chat. MQTT is often superior for mobile-first solutions. It features a lightweight header and keep-alive mechanisms. WhatsApp historically built its success on a customized version of XMPP. This ran on Erlang to handle millions of concurrent connections.

The message flow follows a specific sequence. The sender encrypts the message and sends it to the chat service. The chat service enforces per-user and per-device rate limits to prevent abuse and ensure fair resource usage across the system. The service stores the message in a database. It attempts to push the message to the recipient. The message is delivered via the open socket if the recipient is online. The message sits in a pending queue if the recipient is offline. The client sends an acknowledgment to the server once the delivery is complete. This updates the status to delivered. A second acknowledgment updates the status to read when the user opens the chat.

Look at the sequence of events during a successful message delivery to visualize this.

message_delivery_sequence — Sequence of message delivery and acknowledgment

Network packets often arrive out of order. We cannot rely on server timestamps alone. Clock skew between distributed servers causes inconsistencies. We use sequence numbers or vector clocks instead. Each chat session maintains a counter. The client buffers message 5 if it has not seen message 4. This ensures the conversation remains logical.

Note: Erlang was originally designed by Ericsson for telecom switches. It handles massive concurrency with high reliability. This capability became the backbone of the WhatsApp server infrastructure.

Complexity grows exponentially when we introduce groups.

Step 6: Group messaging

Group chats introduce a fan-out problem. The server must deliver a message to 100 different connections if a user messages a group of 100. There are two primary approaches to handling this. These are fan-out on write and fan-out on read. Fan-out on write is preferred for typical groups with limited users. The system looks up all group members. It enqueues the message for each one. This makes delivery fast for recipients. It puts a heavy load on the server during the send process.

Fan-out on write becomes too expensive for massive channels. A hybrid or fan-out-on-read approach is better in those cases. The system stores one copy of the message. Users request updates when they become active. A dedicated Group Service maintains group metadata and membership lists. This data is cached heavily to reduce database hits.

Watch out: Do not send read receipts to everyone for every message in large groups. This optimizes bandwidth. Aggregate them or only show read by specific users in the message details. This prevents a notification storm.

Modern users expect to share high-definition media instantly.

Step 7: Media sharing and storage

Handling media requires a decoupled architecture. We never store binary data directly in the primary message database. This would degrade performance. The client uploads an image to an object store via the media service. The media service returns a unique URL or ID. The client sends a standard text message containing this URL to the recipient. The recipient app receives the text message. It asynchronously downloads the image from the URL.

The following diagram details this separation between the control signal and the data plane.

media_upload_flow — Decoupled media upload and download flow.

A Content Delivery Network should sit in front of the object storage. This improves performance. Users in India download a viral video from a Mumbai server. Users in London download it from a UK server. This reduces latency and backbone traffic.

Note: Media duplication can increase storage costs. Implement client-side hashing before upload. Skip the upload if the hash exists on the server. Reuse the existing file reference instead.

Users need to know who is available to receive data.

Step 8: Notifications and presence

The presence service manages user status. This is typically handled using a heartbeat mechanism. The client sends a heartbeat to the server every few seconds. The server marks the user as offline if it misses a few heartbeats. Presence data is usually stored in a transient store, such as Redis. At a large scale, presence updates are often coarse-grained or batched to avoid broadcasting frequent state changes across millions of connections. This saves battery and bandwidth.

We rely on third-party push networks for notifications when the app is closed. The notification service acts as a bridge. It formats the payload and sends it to providers like APNs or FCM. Decouple this from the main message processing loop using a message queue. A delay in APNs should not block message delivery.

Note: Typing indicators are ephemeral. They are usually sent as transient signals. These signals are not stored in the database. They pass through the WebSocket connection. The system discards them if the recipient is offline.

We must address the most critical non-functional requirement, which is privacy.

Step 9: End-to-end encryption

Security in modern messaging is defined by the Signal Protocol. This provides end-to-end encryption. The server only routes encrypted binary data. It never sees the plaintext. This is achieved using public-key cryptography. The sender device fetches the recipient public key bundle from the server. The device generates a shared secret to encrypt the message. Only the recipient can decrypt it. They hold the corresponding private key on their device.

This system must support Forward Secrecy. A hacker cannot decrypt past messages even if they steal a private key today. We achieve this by frequently rotating session keys. Every message sent generates a new ephemeral key. This creates a self-healing security chain.

encryption_double_ratchet — Forward secrecy via the Double Ratchet algorithm.

Step 10: Scalability considerations

Scaling to billions of users requires horizontal partitioning. Sharding by User ID is the most effective strategy. All data related to a specific user resides on a specific shard. This allows the system to route requests efficiently using consistent hashing. Global scaling requires geo-distribution. You cannot serve users in Brazil from a data center in Japan without incurring latency. The architecture should deploy edge servers close to users. These terminate the WebSocket connection. The core data might be replicated asynchronously across regions.

Cross-region replication is typically asynchronous. This can introduce temporary inconsistencies during regional failovers or network partitions. These inconsistencies are resolved at the client using message sequence numbers and delivery acknowledgments.

Step 11: Reliability and fault tolerance

Reliability means never losing a message. This is achieved through the store-and-forward mechanism. A message is persisted to a write-ahead log or message queue upon entry. Processing begins after persistence. The message remains in the queue if the chat service fails. A standby instance processes it. Messages for offline users are stored in a temporary pending database table. The system queries this table once the user reconnects. It pushes all missed messages in a batch.

Tip: Implement Graceful Degradation. Drop non-essential features first if the system is overloaded. This preserves the core ability to send and receive text messages.

When sustained load exceeds capacity, the system applies admission control rather than allowing queues to grow indefinitely. It may reject new message sends with a retry response, slow down non-essential message types, or temporarily disable features like typing indicators to preserve core delivery guarantees.

We must consider the trade-offs involved in these design choices.

Step 12: Trade-offs and extensions

Every System Design involves trade-offs. The biggest trade-off for WhatsApp is often Consistency vs Availability. Availability is preferred in a messaging app. Users must be able to send messages even if the network is partition-prone. The system resolves inconsistencies eventually on the client side. Another trade-off is latency vs. battery life. Maintaining a constant connection is fast but drains power. Protocols like MQTT help. Aggressive background process management by mobile OSs forces reliance on push notifications. This adds latency.

Supporting multiple devices simultaneously is a major requirement. This complicates E2EE. The sender must encrypt the message for every device the recipient owns. This is often handled by the Sidecar approach. The sender client fans out the encrypted message to all recipient device IDs. Syncing chat history requires a robust conflict resolution strategy. Deleting a message on a phone should also remove it from the desktop.

The following table summarizes the key protocol choices and their trade-offs.

Protocol	Pros	Cons	Best Use Case
HTTP (Short Polling)	Simple to implement.	High latency, server load, and battery drain.	Not recommended for chat.
HTTP (Long Polling)	Better than short polling.	Still overhead heavy; connection setup costs.	Web apps with low traffic.
WebSocket	Full duplex, persistent, low latency.	Requires keeping the connection open; stateful.	Web-based chat clients.
MQTT	Extremely lightweight, battery efficient.	Requires a broker; less robust for heavy media.	Mobile apps & IoT.
XMPP	Decentralized, extensible, mature.	XML is verbose (high bandwidth usage).	Enterprise chat (WhatsApp uses a modified binary version).

Comparison of communication protocols for messaging.

Conclusion

In interviews, focus on the message delivery pipeline first. Clarify latency and reliability requirements, then explain persistent connections, store-and-forward delivery, and encryption. Group messaging, media handling, and multi-device sync can be layered on once the core flow is clear.

Designing a platform like WhatsApp balances massive scale with user experience. We moved from defining basic requirements to architecting a complex system. This system leverages persistent connections and partitioned databases. It uses advanced encryption protocols to securely deliver messages. Key takeaways include decoupling media from text. The store-and-forward model is necessary for reliability. The Signal Protocol plays a critical role in modern privacy.

The next frontier lies in AI integration. This involves running lightweight Large Language Models directly on the device. It offers smart replies without breaking end-to-end encryption. The goal is to build a reliable communication utility.

Share with others

Updated 1 week ago
Fahim
12 min read

Leave a Reply Cancel reply

Popular Guides

Related Guides

Recent Guides

Design e-commerce System Design: Complete System Design interview guide

When an interviewer asks you to design an e-commerce system, they are not asking you to build a website with product pages and a checkout button. They are testing whether

C10K Problem Explained: Scalable Network Design for High-Traffic Systems

When you begin learning System Design, you quickly realize that scalability is not just about adding more servers. It is about understanding how a single machine behaves under pressure before

System Design in a Hurry: A Quick Prep Guide for Interview Success

Most engineers feel overwhelmed when preparing for System Design interviews, partly because System Design seems limitless, and partly because interviewers expect clarity under extreme time constraints. The good news is

Design Zoom: A Complete System Design Interview Guide

Designing Zoom is a popular System Design interview problem because it forces candidates to reason about real-time communication under strict performance constraints. Unlike text-based systems, video conferencing introduces challenges around

How to design a distributed logging system

When interviewers ask you to design a distributed logging system, they are not testing whether you know the internals of Elasticsearch or can recite the Kafka API. They are testing

Design Slack: A Complete System Design Interview Guide

Designing Slack is a popular System Design interview problem because it tests a candidate’s ability to reason about real-time systems at scale. Unlike simpler CRUD-based applications, Slack introduces challenges such