Modern Backend Scaling: Stateless APIs to Serverless

Building scalable backend systems is not about memorizing a list of tools. It is about understanding where bottlenecks appear, how systems evolve with growth, and which techniques solve which problems.

Modern systems rarely scale with a single technique. Instead they combine multiple layers:

Stateless services
Load balancing
Database scaling
Distributed caching
CDNs
Asynchronous processing
Service architecture (monolith vs microservices)
Serverless computing

This article explains these concepts in a practical and interview-ready way, correcting common misconceptions and aligning them with modern cloud architecture used today (2025+).

Statelessness — The Foundation of Horizontal Scaling

Horizontal scaling means adding more servers instead of increasing the power of a single machine.

For example:

Vertical scaling → upgrading a server from 4GB RAM → 32GB RAM
Horizontal scaling → running multiple instances of the same service

For horizontal scaling to work properly, backend services must be stateless.

A stateless service does not store request-specific or user-specific state locally in memory. Every request must be independently processable by any server instance.

Why stateful servers break scaling

If a server stores session data in memory:

Server A memory:
session[user123] = {...}

And the next request goes to Server B, the session will not exist there.

Result:

401 Unauthorized

This creates inconsistent behaviour across servers.

Correct approach: externalize state

State must be stored in shared infrastructure:

State Type	Correct Storage
Sessions	Redis / distributed cache
Files	Object storage (S3, R2, GCS)
Databases	Managed databases (PostgreSQL, MySQL, etc.)
Cache	Redis / Memcached

Example: storing sessions in Redis (Node.js)

import session from "express-session"
import RedisStore from "connect-redis"
import { createClient } from "redis"

const redisClient = createClient({ url: process.env.REDIS_URL })
await redisClient.connect()

app.use(
  session({
    store: new RedisStore({ client: redisClient }),
    secret: process.env.SESSION_SECRET,
    resave: false,
    saveUninitialized: false
  })
)

Now any server instance can verify the session.

Stateless architecture is the prerequisite for scalable backend systems.

Load Balancers — Distributing Traffic

Once multiple backend instances exist, a mechanism is required to distribute traffic among them. That component is the load balancer.

All incoming traffic first goes to the load balancer, which forwards requests to backend servers.

Common production load balancers:

NGINX
HAProxy
AWS ALB / ELB
Cloudflare
Envoy

Common load balancing algorithms

Round Robin

Requests are distributed sequentially.

Server selection pattern:

A → B → C → A → B → C

Works well when:

requests have similar processing cost
servers have equal capacity

Weighted Round Robin

Used when servers have different capacity.

Example configuration:

Server A weight = 2
Server B weight = 1
Server C weight = 1

Traffic ratio:

A : B : C = 2 : 1 : 1

Least Connections

Requests are sent to the server with the fewest active connections.

Useful when:

request processing time varies
some requests are heavy

This algorithm adapts automatically to server load.

Health Checks

Load balancers continuously check server health using lightweight endpoints.

Example health endpoint:

app.get("/health", (req, res) => {
  res.status(200).send("OK")
})

If a server stops responding with HTTP 200, the load balancer removes it from rotation.

This prevents users from receiving errors.

Database Scaling

Scaling application servers is relatively easy. Scaling databases is much harder because databases maintain state and consistency.

Two primary strategies exist.

Read Replicas

A primary database handles writes while replicas serve read queries.

Flow:

Write queries → Primary database
Read queries → Replicas

Example architecture:

Primary DB → writes
Replica 1 → reads
Replica 2 → reads
Replica 3 → reads

This dramatically reduces load on the primary database.

Typical traffic distribution in SaaS applications:

70–90% reads
10–30% writes

So replicas handle most of the traffic.

Replication Lag

Replicas synchronize asynchronously.

Example:

User updates profile
Write occurs in primary DB
Replica updates after ~200 ms
Immediate read may still return old data

Solutions include:

read-after-write routing to primary
delaying read queries briefly
caching updated data on client

Managed database services already handle much of this complexity.

Popular managed providers:

AWS RDS
Google Cloud SQL
Neon
PlanetScale
CockroachDB
YugabyteDB

Database Sharding

Sharding divides a large dataset across multiple databases.

Instead of storing everything in a single database:

Orders 1-5 → DB shard A
Orders 6-10 → DB shard B

Sharding improves:

query performance
throughput
storage scalability

Choosing a Sharding Key

A sharding key determines how data is distributed.

Examples:

Key	Example
User ID	distributes users evenly
Geographic region	data near users
Date	useful for time-series data

Example query routing:

function getShard(userId) {
  return userId % 4
}

This distributes users across 4 shards.

Sharding introduces complexity such as:

cross-shard queries
distributed transactions
data rebalancing

Therefore most teams use managed distributed databases instead of building this from scratch.

Content Delivery Networks (CDNs)

CDNs reduce latency by serving content from servers closer to users.

Static assets are the primary use case:

JavaScript bundles
CSS
images
fonts
videos

Instead of requesting assets from a distant server, users receive them from nearby edge locations.

Benefits:

lower latency
reduced origin server load
improved global performance

Modern CDN providers include:

Cloudflare
AWS CloudFront
Fastly
Akamai

Example CDN caching headers

Cache-Control: public, max-age=3600

This allows CDN nodes to cache content for one hour.

CDNs also provide protection against large-scale attacks such as DDoS attacks by absorbing massive traffic spikes.

Edge Computing

Traditional CDNs only served static files.

Edge computing allows code execution at edge locations.

Examples:

Cloudflare Workers
Vercel Edge Functions
Fastly Compute@Edge

Edge computing is useful for:

authentication checks
localization
request routing
header manipulation

Example authentication at edge:

export default async function handler(req) {
  const token = req.headers.get("authorization")

  if (!token) {
    return new Response("Unauthorized", { status: 401 })
  }

  return fetch(req)
}

Edge computing reduces unnecessary traffic reaching origin servers.

However edge environments have limitations:

restricted runtime APIs
limited memory
execution time limits

They are designed for lightweight computation, not heavy workloads.

Asynchronous Processing

Many operations do not require immediate completion.

Performing them synchronously increases latency unnecessarily.

Examples of asynchronous tasks:

sending emails
notifications
video processing
report generation
background data cleanup

Instead of blocking user requests, these tasks are pushed to a message queue.

Background job architecture

Components:

Producer (application server)
Queue (Redis / RabbitMQ / Kafka)
Worker (background processor)

Example with BullMQ

import { Queue } from "bullmq"

const emailQueue = new Queue("email")

await emailQueue.add("sendInvite", {
  email: "user@example.com"
})

Worker:

import { Worker } from "bullmq"

new Worker("email", async job => {
  await sendEmail(job.data.email)
})

Benefits:

faster user response times
improved reliability
easier scaling of background workers

This pattern is heavily used in production systems.

Monolith vs Microservices

A monolith is a single deployable application containing all business logic.

Advantages:

simple architecture
easy debugging
faster development
simpler deployments

Monoliths scale surprisingly well with horizontal scaling.

Microservices divide an application into independent services.

Examples:

authentication service
payment service
notification service
order service

When microservices make sense

Microservices are primarily about organizational scalability, not just performance.

They are useful when:

teams exceed 100+ engineers
services require independent scaling
different technologies are needed

Example:

image processing service → Rust
API service → Node.js
analytics pipeline → Python

Tradeoffs

Microservices introduce:

network latency
distributed debugging complexity
data consistency challenges
infrastructure overhead

For most startups and mid-scale systems, a well-structured monolith is often the best starting architecture.

Serverless Computing

Serverless abstracts away server management.

Developers deploy functions, and infrastructure automatically executes them when events occur.

Popular platforms:

AWS Lambda
Cloudflare Workers
Vercel Functions
Google Cloud Functions

Serverless execution model

Instead of running servers continuously:

Request arrives
↓
Function instance starts
↓
Code executes
↓
Response returned
↓
Instance terminated

Billing occurs only during execution time.

Advantages

automatic scaling
no server management
pay-per-execution pricing
excellent for burst traffic

Cold Start Problem

Because functions are created on demand, the first request may experience startup latency.

Modern improvements such as:

lightweight VM technologies
optimized runtimes
edge execution

have significantly reduced cold start delays.

Good serverless use cases

Serverless is ideal for:

event processing
image/video processing
webhooks
scheduled jobs
API endpoints with unpredictable traffic

Poor serverless use cases

Serverless is less suitable for:

long running tasks
latency-critical systems
persistent connections
heavy stateful workloads

Practical Scaling Strategy

In real systems scaling evolves gradually.

Typical progression:

Start with a simple monolith
Add caching
optimize database queries
add horizontal scaling
introduce background jobs
use CDNs
scale database with replicas
consider microservices if necessary

Most applications never need extreme architectures.

Key Interview Takeaways

Interviewers frequently test conceptual understanding.

Important points to remember:

Horizontal vs Vertical Scaling

Vertical → upgrade machine
Horizontal → add machines

Stateless Services

State must be externalized to enable scaling.

Load Balancers

Distribute traffic and monitor server health.

Database Scaling

Common techniques:

read replicas
sharding

CDN

Improves latency and reduces origin traffic.

Asynchronous Processing

Use message queues for non-critical tasks.

Microservices

Primarily solve team scaling problems.

Serverless

Best for event-driven and burst workloads.

Final Principles for Scalable Systems

Measure before optimizing
Use observability tools like Prometheus, Grafana, or New Relic.
Prefer simple architectures first
Complexity increases operational cost.
Scale gradually
Optimize based on real bottlenecks.
Design stateless services early
Understand tradeoffs
Every architecture decision introduces new constraints.

Scalable systems are not built through a single technology choice. They emerge through careful measurement, iterative improvements, and disciplined architectural decisions.

Mastering these principles prepares engineers not only for system design interviews but also for building production-grade backend systems.

This is part of series Backend First Principles. Next: Understanding Concurrency in Backend Systems

Command Palette

Statelessness — The Foundation of Horizontal Scaling

Why stateful servers break scaling

Correct approach: externalize state

Example: storing sessions in Redis (Node.js)

Load Balancers — Distributing Traffic

Common load balancing algorithms

Round Robin

Weighted Round Robin

Least Connections

Health Checks

Database Scaling

Read Replicas

Replication Lag

Database Sharding

Choosing a Sharding Key

Content Delivery Networks (CDNs)

Example CDN caching headers

Edge Computing

Asynchronous Processing

Background job architecture

Example with BullMQ

Monolith vs Microservices

When microservices make sense

Tradeoffs

Serverless Computing

Serverless execution model

Advantages

Cold Start Problem

Good serverless use cases

Poor serverless use cases

Practical Scaling Strategy

Key Interview Takeaways

Horizontal vs Vertical Scaling

Stateless Services

Load Balancers

Database Scaling

CDN

Asynchronous Processing

Microservices

Serverless

Final Principles for Scalable Systems

Comments

Backend First Principles

Backend Performance & Scaling: A Practical Engineering Guide

More from this blog