Modern Backend Scaling: From Stateless APIs to Serverless Architectures
Building scalable backend systems is not about memorizing a list of tools. It is about understanding where bottlenecks appear, how systems evolve with growth, and which techniques solve which problems.
Modern systems rarely scale with a single technique. Instead they combine multiple layers:
Stateless services
Load balancing
Database scaling
Distributed caching
CDNs
Asynchronous processing
Service architecture (monolith vs microservices)
Serverless computing
This article explains these concepts in a practical and interview-ready way, correcting common misconceptions and aligning them with modern cloud architecture used today (2025+).
Statelessness — The Foundation of Horizontal Scaling
Horizontal scaling means adding more servers instead of increasing the power of a single machine.
For example:
Vertical scaling → upgrading a server from 4GB RAM → 32GB RAM
Horizontal scaling → running multiple instances of the same service
For horizontal scaling to work properly, backend services must be stateless.
A stateless service does not store request-specific or user-specific state locally in memory. Every request must be independently processable by any server instance.
Why stateful servers break scaling
If a server stores session data in memory:
Server A memory:
session[user123] = {...}
And the next request goes to Server B, the session will not exist there.
Result:
401 Unauthorized
This creates inconsistent behaviour across servers.
Correct approach: externalize state
State must be stored in shared infrastructure:
| State Type | Correct Storage |
|---|---|
| Sessions | Redis / distributed cache |
| Files | Object storage (S3, R2, GCS) |
| Databases | Managed databases (PostgreSQL, MySQL, etc.) |
| Cache | Redis / Memcached |
Example: storing sessions in Redis (Node.js)
import session from "express-session"
import RedisStore from "connect-redis"
import { createClient } from "redis"
const redisClient = createClient({ url: process.env.REDIS_URL })
await redisClient.connect()
app.use(
session({
store: new RedisStore({ client: redisClient }),
secret: process.env.SESSION_SECRET,
resave: false,
saveUninitialized: false
})
)
Now any server instance can verify the session.
Stateless architecture is the prerequisite for scalable backend systems.
Load Balancers — Distributing Traffic
Once multiple backend instances exist, a mechanism is required to distribute traffic among them. That component is the load balancer.
All incoming traffic first goes to the load balancer, which forwards requests to backend servers.
Common production load balancers:
NGINX
HAProxy
AWS ALB / ELB
Cloudflare
Envoy
Common load balancing algorithms
Round Robin
Requests are distributed sequentially.
Server selection pattern:
A → B → C → A → B → C
Works well when:
requests have similar processing cost
servers have equal capacity
Weighted Round Robin
Used when servers have different capacity.
Example configuration:
Server A weight = 2
Server B weight = 1
Server C weight = 1
Traffic ratio:
A : B : C = 2 : 1 : 1
Least Connections
Requests are sent to the server with the fewest active connections.
Useful when:
request processing time varies
some requests are heavy
This algorithm adapts automatically to server load.
Health Checks
Load balancers continuously check server health using lightweight endpoints.
Example health endpoint:
app.get("/health", (req, res) => {
res.status(200).send("OK")
})
If a server stops responding with HTTP 200, the load balancer removes it from rotation.
This prevents users from receiving errors.
Database Scaling
Scaling application servers is relatively easy. Scaling databases is much harder because databases maintain state and consistency.
Two primary strategies exist.
Read Replicas
A primary database handles writes while replicas serve read queries.
Flow:
Write queries → Primary database
Read queries → Replicas
Example architecture:
Primary DB → writes
Replica 1 → reads
Replica 2 → reads
Replica 3 → reads
This dramatically reduces load on the primary database.
Typical traffic distribution in SaaS applications:
70–90% reads
10–30% writes
So replicas handle most of the traffic.
Replication Lag
Replicas synchronize asynchronously.
Example:
User updates profile
Write occurs in primary DB
Replica updates after ~200 ms
Immediate read may still return old data
Solutions include:
read-after-write routing to primary
delaying read queries briefly
caching updated data on client
Managed database services already handle much of this complexity.
Popular managed providers:
AWS RDS
Google Cloud SQL
Neon
PlanetScale
CockroachDB
YugabyteDB
Database Sharding
Sharding divides a large dataset across multiple databases.
Instead of storing everything in a single database:
Orders 1-5 → DB shard A
Orders 6-10 → DB shard B
Sharding improves:
query performance
throughput
storage scalability
Choosing a Sharding Key
A sharding key determines how data is distributed.
Examples:
| Key | Example |
|---|---|
| User ID | distributes users evenly |
| Geographic region | data near users |
| Date | useful for time-series data |
Example query routing:
function getShard(userId) {
return userId % 4
}
This distributes users across 4 shards.
Sharding introduces complexity such as:
cross-shard queries
distributed transactions
data rebalancing
Therefore most teams use managed distributed databases instead of building this from scratch.
Content Delivery Networks (CDNs)
CDNs reduce latency by serving content from servers closer to users.
Static assets are the primary use case:
JavaScript bundles
CSS
images
fonts
videos
Instead of requesting assets from a distant server, users receive them from nearby edge locations.
Benefits:
lower latency
reduced origin server load
improved global performance
Modern CDN providers include:
Cloudflare
AWS CloudFront
Fastly
Akamai
Example CDN caching headers
Cache-Control: public, max-age=3600
This allows CDN nodes to cache content for one hour.
CDNs also provide protection against large-scale attacks such as DDoS attacks by absorbing massive traffic spikes.
Edge Computing
Traditional CDNs only served static files.
Edge computing allows code execution at edge locations.
Examples:
Cloudflare Workers
Vercel Edge Functions
Fastly Compute@Edge
Edge computing is useful for:
authentication checks
localization
request routing
header manipulation
Example authentication at edge:
export default async function handler(req) {
const token = req.headers.get("authorization")
if (!token) {
return new Response("Unauthorized", { status: 401 })
}
return fetch(req)
}
Edge computing reduces unnecessary traffic reaching origin servers.
However edge environments have limitations:
restricted runtime APIs
limited memory
execution time limits
They are designed for lightweight computation, not heavy workloads.
Asynchronous Processing
Many operations do not require immediate completion.
Performing them synchronously increases latency unnecessarily.
Examples of asynchronous tasks:
sending emails
notifications
video processing
report generation
background data cleanup
Instead of blocking user requests, these tasks are pushed to a message queue.
Background job architecture
Components:
Producer (application server)
Queue (Redis / RabbitMQ / Kafka)
Worker (background processor)
Example with BullMQ
import { Queue } from "bullmq"
const emailQueue = new Queue("email")
await emailQueue.add("sendInvite", {
email: "user@example.com"
})
Worker:
import { Worker } from "bullmq"
new Worker("email", async job => {
await sendEmail(job.data.email)
})
Benefits:
faster user response times
improved reliability
easier scaling of background workers
This pattern is heavily used in production systems.
Monolith vs Microservices
A monolith is a single deployable application containing all business logic.
Advantages:
simple architecture
easy debugging
faster development
simpler deployments
Monoliths scale surprisingly well with horizontal scaling.
Microservices divide an application into independent services.
Examples:
authentication service
payment service
notification service
order service
When microservices make sense
Microservices are primarily about organizational scalability, not just performance.
They are useful when:
teams exceed 100+ engineers
services require independent scaling
different technologies are needed
Example:
image processing service → Rust
API service → Node.js
analytics pipeline → Python
Tradeoffs
Microservices introduce:
network latency
distributed debugging complexity
data consistency challenges
infrastructure overhead
For most startups and mid-scale systems, a well-structured monolith is often the best starting architecture.
Serverless Computing
Serverless abstracts away server management.
Developers deploy functions, and infrastructure automatically executes them when events occur.
Popular platforms:
AWS Lambda
Cloudflare Workers
Vercel Functions
Google Cloud Functions
Serverless execution model
Instead of running servers continuously:
Request arrives
↓
Function instance starts
↓
Code executes
↓
Response returned
↓
Instance terminated
Billing occurs only during execution time.
Advantages
automatic scaling
no server management
pay-per-execution pricing
excellent for burst traffic
Cold Start Problem
Because functions are created on demand, the first request may experience startup latency.
Modern improvements such as:
lightweight VM technologies
optimized runtimes
edge execution
have significantly reduced cold start delays.
Good serverless use cases
Serverless is ideal for:
event processing
image/video processing
webhooks
scheduled jobs
API endpoints with unpredictable traffic
Poor serverless use cases
Serverless is less suitable for:
long running tasks
latency-critical systems
persistent connections
heavy stateful workloads
Practical Scaling Strategy
In real systems scaling evolves gradually.
Typical progression:
Start with a simple monolith
Add caching
optimize database queries
add horizontal scaling
introduce background jobs
use CDNs
scale database with replicas
consider microservices if necessary
Most applications never need extreme architectures.
Key Interview Takeaways
Interviewers frequently test conceptual understanding.
Important points to remember:
Horizontal vs Vertical Scaling
Vertical → upgrade machine
Horizontal → add machines
Stateless Services
State must be externalized to enable scaling.
Load Balancers
Distribute traffic and monitor server health.
Database Scaling
Common techniques:
read replicas
sharding
CDN
Improves latency and reduces origin traffic.
Asynchronous Processing
Use message queues for non-critical tasks.
Microservices
Primarily solve team scaling problems.
Serverless
Best for event-driven and burst workloads.
Final Principles for Scalable Systems
Measure before optimizing
Use observability tools like Prometheus, Grafana, or New Relic.Prefer simple architectures first
Complexity increases operational cost.Scale gradually
Optimize based on real bottlenecks.Design stateless services early
Understand tradeoffs
Every architecture decision introduces new constraints.
Scalable systems are not built through a single technology choice. They emerge through careful measurement, iterative improvements, and disciplined architectural decisions.
Mastering these principles prepares engineers not only for system design interviews but also for building production-grade backend systems.
This is part of series Backend First Principles. Next: Understanding Concurrency in Backend Systems

