How To Design A Real-time Chat Application In System Design Interview?

Designing a real-time chat app is a classic System Design interview challenge — and for good reason. It tests your ability to balance performance, scalability, consistency, and reliability in a user-facing, latency-sensitive system. 

From message delivery guarantees to socket management, interviewers want to see how you structure your thinking around building a real-world application under constraints.

In this blog, we’ll walk through how to approach a chat application System Design interview, including core components, trade-offs, and patterns that show you think like a real engineer.

Understand the requirements first

Before diving into architecture, clarify functional and non-functional requirements. This step sets the foundation for everything else you’ll discuss.

Functional requirements:

  • One-on-one messaging between users in real time
  • Support for group chats and channels
  • Message delivery acknowledgment and retry
  • Typing indicators, read receipts, and user presence

Non-functional requirements:

  • Low latency (ideally < 200ms message round trip)
  • High availability (99.99% uptime or better)
  • Durability and message persistence
  • Scalability to millions of concurrent users
  • Optional features like end-to-end encryption and offline sync

Clarifying these early on helps you define scope, prioritize design trade-offs, and align with the interviewer’s expectations.

High-level architecture overview

Start with a simple, modular diagram. Your chat application will consist of:

  • A frontend (React/Flutter) using WebSockets for real-time updates
  • A load balancer for request routing and failover
  • Multiple chat servers handle connection management and real-time message flow
  • A message queue or broker (Kafka, RabbitMQ) for decoupling send and receive logic
  • A persistent storage layer for saving messages and metadata (PostgreSQL, Cassandra, DynamoDB)

You can add external systems like push notification services, search indexing, and admin tools as needed. A layered approach communicates that you understand the separation of concerns.

Choosing between WebSockets and polling

WebSockets provide a persistent, bi-directional connection ideal for chat:

  • Pros: Lower overhead, real-time push, scalable to thousands of clients
  • Cons: Stateful, requires heartbeat/ping, not ideal for unstable connections

Compare with HTTP long polling:

  • Pros: Simple to implement, works on most firewalls
  • Cons: Higher latency and resource usage

It’s useful to propose a fallback model where clients attempt WebSockets and gracefully degrade to polling.

How to manage connections and sessions

Connection handling is critical in a real-time system:

  • Each client maintains a WebSocket connection to a chat server
  • Store active connections in Redis or an in-memory store for quick lookup
  • Track metadata like user ID, session ID, current chat room, and timestamps
  • Use heartbeat messages to detect disconnects

Explain how sticky sessions or consistent hashing can route reconnecting users to the same server.

Message delivery and durability

Ensure that every message:

  • Is stored to a durable backend (before or after delivery, depending on trade-offs)
  • Gets delivered via pub/sub queues to other connected clients
  • Triggers acknowledgments from receivers
  • Has retry logic or dead-letter queues for failures

Mention the eventual consistency vs strong consistency trade-offs, especially in group chats.

How to ensure ordering and consistency

For 1:1 chat, you can use:

  • Strict message sequence numbers per chat session
  • Append-only data stores or ordered logs (Kafka)

For group chat:

  • Partial ordering (per-sender) might be acceptable
  • Logical clocks can help in tie-breaking

Explain how you’d ensure idempotency by de-duplicating messages via message IDs.

Scaling chat servers

You need to scale both horizontally and elastically:

  • Use stateless chat servers with sticky sessions
  • Load balancers (L4 or L7) for traffic routing
  • Redis or ZooKeeper to share presence and metadata

Prepare for peak traffic by autoscaling servers and using circuit breakers for graceful degradation.

Group chat considerations

To design scalable group chats:

  • Implement fan-out on write or fan-out on read, depending on delivery urgency
  • Maintain chat room metadata (user list, timestamps) in a central store
  • Cache recent group messages for faster delivery
  • Use streaming systems (Kafka topics) to publish messages to active members

Discuss failure scenarios like member joins, leaves, and reconnects mid-conversation.

Handling offline users

Offline messaging involves:

  • Writing messages to persistent queues
  • Storing last-read and last-delivered markers
  • Replaying messages upon reconnection from those markers

Push notifications via APNs/FCM can alert users to new messages. Be sure to mention expiration policies and queue limits.

Optional features to impress

Impress your interviewer with advanced UX features:

  • Read receipts: Store last-read timestamp per user/chat
  • Typing indicators: Broadcast ephemeral events via in-memory pub/sub
  • Presence tracking: Use Redis bitmaps or sets for scalable online status
  • End-to-end encryption: Discuss key management, encryption/decryption client-side

These aren’t core to MVP delivery but demonstrate attention to real-world complexity.

Trade-offs to discuss with the interviewer

Highlight decisions you’d make based on:

  • Latency vs durability (in-memory vs disk writes)
  • Message order guarantees vs throughput
  • Stateless servers vs sticky sessions
  • Simplicity vs feature completeness

Use real-world examples (“In WhatsApp, message ordering can be eventually consistent in group chats…”) to show you think beyond the whiteboard.

Handling user authentication and security

Authentication and authorization must be secure:

  • Use OAuth 2.0 or JWT tokens to verify users
  • Establish secure WebSocket channels (WSS)
  • Sanitize messages for XSS, injection, and abuse

Mention anti-spam rate limiting and message throttling to defend against malicious clients.

Designing for message search and history

For long-running conversations:

  • Store messages with timestamp indexes
  • Use Elasticsearch or OpenSearch for keyword-based search
  • Allow filtering by chat, user, and time window

Discuss read scaling using read replicas or cached indexes for search-heavy users.

Monitoring and observability

Keep your system visible:

  • Track metrics like latency, connection counts, drop rates, retry counts
  • Emit logs on connection events, message deliveries, failures
  • Build dashboards with Prometheus, Grafana, or Datadog

Use alerting to flag issues before they become outages.

Supporting message deletion and moderation

This is essential for user control and safety:

  • Allow message recalls with TTL windows
  • Flag and quarantine messages for admin review
  • Implement audit logs for moderators

Bring up community guidelines and abuse detection as scalable features.

Versioning and feature rollout

Ensure agility in updates:

  • Use semantic versioning in APIs
  • Apply feature flags with tools like LaunchDarkly
  • Plan for backwards compatibility with older clients

These details prove you’re thinking beyond MVP into long-term product evolution.

Final thoughts

Designing a chat application System Design interview question is about more than just WebSockets and messages. It’s about how you reason through real-world problems: connections, scale, latency, failure, and user experience.

Start from the user experience, think through the moving parts, and explain your choices clearly. Focus on key patterns like pub/sub, connection management, and durable messaging. Show your trade-off thinking, and don’t forget to highlight how the system evolves over time.

Master these building blocks, and you’ll ace your next chat application System Design interview.