Designing a real-time chat app is a classic System Design interview challenge — and for good reason. It tests your ability to balance performance, scalability, consistency, and reliability in a user-facing, latency-sensitive system.
From message delivery guarantees to socket management, interviewers want to see how you structure your thinking around building a real-world application under constraints.
In this blog, we’ll walk through how to approach a chat application System Design interview, including core components, trade-offs, and patterns that show you think like a real engineer.
Understand the requirements first
Before diving into architecture, clarify functional and non-functional requirements. This step sets the foundation for everything else you’ll discuss.
Functional requirements:
- One-on-one messaging between users in real time
- Support for group chats and channels
- Message delivery acknowledgment and retry
- Typing indicators, read receipts, and user presence
Non-functional requirements:
- Low latency (ideally < 200ms message round trip)
- High availability (99.99% uptime or better)
- Durability and message persistence
- Scalability to millions of concurrent users
- Optional features like end-to-end encryption and offline sync
Clarifying these early on helps you define scope, prioritize design trade-offs, and align with the interviewer’s expectations.
High-level architecture overview
Start with a simple, modular diagram. Your chat application will consist of:
- A frontend (React/Flutter) using WebSockets for real-time updates
- A load balancer for request routing and failover
- Multiple chat servers handle connection management and real-time message flow
- A message queue or broker (Kafka, RabbitMQ) for decoupling send and receive logic
- A persistent storage layer for saving messages and metadata (PostgreSQL, Cassandra, DynamoDB)
You can add external systems like push notification services, search indexing, and admin tools as needed. A layered approach communicates that you understand the separation of concerns.
Choosing between WebSockets and polling
WebSockets provide a persistent, bi-directional connection ideal for chat:
- Pros: Lower overhead, real-time push, scalable to thousands of clients
- Cons: Stateful, requires heartbeat/ping, not ideal for unstable connections
Compare with HTTP long polling:
- Pros: Simple to implement, works on most firewalls
- Cons: Higher latency and resource usage
It’s useful to propose a fallback model where clients attempt WebSockets and gracefully degrade to polling.
How to manage connections and sessions
Connection handling is critical in a real-time system:
- Each client maintains a WebSocket connection to a chat server
- Store active connections in Redis or an in-memory store for quick lookup
- Track metadata like user ID, session ID, current chat room, and timestamps
- Use heartbeat messages to detect disconnects
Explain how sticky sessions or consistent hashing can route reconnecting users to the same server.
Message delivery and durability
Ensure that every message:
- Is stored to a durable backend (before or after delivery, depending on trade-offs)
- Gets delivered via pub/sub queues to other connected clients
- Triggers acknowledgments from receivers
- Has retry logic or dead-letter queues for failures
Mention the eventual consistency vs strong consistency trade-offs, especially in group chats.
How to ensure ordering and consistency
For 1:1 chat, you can use:
- Strict message sequence numbers per chat session
- Append-only data stores or ordered logs (Kafka)
For group chat:
- Partial ordering (per-sender) might be acceptable
- Logical clocks can help in tie-breaking
Explain how you’d ensure idempotency by de-duplicating messages via message IDs.
Scaling chat servers
You need to scale both horizontally and elastically:
- Use stateless chat servers with sticky sessions
- Load balancers (L4 or L7) for traffic routing
- Redis or ZooKeeper to share presence and metadata
Prepare for peak traffic by autoscaling servers and using circuit breakers for graceful degradation.
Group chat considerations
To design scalable group chats:
- Implement fan-out on write or fan-out on read, depending on delivery urgency
- Maintain chat room metadata (user list, timestamps) in a central store
- Cache recent group messages for faster delivery
- Use streaming systems (Kafka topics) to publish messages to active members
Discuss failure scenarios like member joins, leaves, and reconnects mid-conversation.
Handling offline users
Offline messaging involves:
- Writing messages to persistent queues
- Storing last-read and last-delivered markers
- Replaying messages upon reconnection from those markers
Push notifications via APNs/FCM can alert users to new messages. Be sure to mention expiration policies and queue limits.
Optional features to impress
Impress your interviewer with advanced UX features:
- Read receipts: Store last-read timestamp per user/chat
- Typing indicators: Broadcast ephemeral events via in-memory pub/sub
- Presence tracking: Use Redis bitmaps or sets for scalable online status
- End-to-end encryption: Discuss key management, encryption/decryption client-side
These aren’t core to MVP delivery but demonstrate attention to real-world complexity.
Trade-offs to discuss with the interviewer
Highlight decisions you’d make based on:
- Latency vs durability (in-memory vs disk writes)
- Message order guarantees vs throughput
- Stateless servers vs sticky sessions
- Simplicity vs feature completeness
Use real-world examples (“In WhatsApp, message ordering can be eventually consistent in group chats…”) to show you think beyond the whiteboard.
Handling user authentication and security
Authentication and authorization must be secure:
- Use OAuth 2.0 or JWT tokens to verify users
- Establish secure WebSocket channels (WSS)
- Sanitize messages for XSS, injection, and abuse
Mention anti-spam rate limiting and message throttling to defend against malicious clients.
Designing for message search and history
For long-running conversations:
- Store messages with timestamp indexes
- Use Elasticsearch or OpenSearch for keyword-based search
- Allow filtering by chat, user, and time window
Discuss read scaling using read replicas or cached indexes for search-heavy users.
Monitoring and observability
Keep your system visible:
- Track metrics like latency, connection counts, drop rates, retry counts
- Emit logs on connection events, message deliveries, failures
- Build dashboards with Prometheus, Grafana, or Datadog
Use alerting to flag issues before they become outages.
Supporting message deletion and moderation
This is essential for user control and safety:
- Allow message recalls with TTL windows
- Flag and quarantine messages for admin review
- Implement audit logs for moderators
Bring up community guidelines and abuse detection as scalable features.
Versioning and feature rollout
Ensure agility in updates:
- Use semantic versioning in APIs
- Apply feature flags with tools like LaunchDarkly
- Plan for backwards compatibility with older clients
These details prove you’re thinking beyond MVP into long-term product evolution.
Final thoughts
Designing a chat application System Design interview question is about more than just WebSockets and messages. It’s about how you reason through real-world problems: connections, scale, latency, failure, and user experience.
Start from the user experience, think through the moving parts, and explain your choices clearly. Focus on key patterns like pub/sub, connection management, and durable messaging. Show your trade-off thinking, and don’t forget to highlight how the system evolves over time.
Master these building blocks, and you’ll ace your next chat application System Design interview.