In a monolith, services talk by calling a method. In microservices, they talk across a network — and that changes everything. Security, reliability, observability, and identity all become your problem. Here's how modern architectures solve this, end to end.
PART 01 — The Problem With Point-to-Point
Imagine 10 services that each need to call some subset of the others. Without any governance, you end up with point-to-point connections where every service implements its own retry logic, TLS setup, circuit breaking, and authentication. Ten services means potentially 90 direct connections — each a custom, one-off integration.
The anti-pattern looks like this:
SERVICE A SERVICE B (duplicate)
+ retry logic + retry logic
+ TLS setup code + TLS setup code
+ circuit breaker + circuit breaker
+ auth & token validation + auth & token validation
+ distributed tracing + distributed tracing
Every team duplicates the same cross-cutting concerns in application code.
The solution is to move all of that cross-cutting infrastructure out of application code and into a dedicated layer. There are two main ways to do this:
- API Gateway — for north-south traffic (client to service)
- Service Mesh — for east-west traffic (service to service)
Most mature architectures use both.
PART 02 — The Service Mesh — How It Actually Works
A service mesh solves east-west communication by deploying a lightweight sidecar proxy (Envoy) alongside every service instance. The proxy intercepts all network traffic — inbound and outbound — without the application knowing it exists.
The app thinks it's calling http://payment-service:8080. The OS intercepts that packet via iptables rules and sends it to the local Envoy proxy instead.
VM / HOST 1 VM / HOST 2
┌──────────────────────────┐ ┌──────────────────────────┐
│ Order Service :8080 │ │ Envoy Sidecar :15001 │
│ business logic │ │ iptables intercept │
│ ↓ localhost │ │ ↓ localhost │
│ Envoy Sidecar :15001 │ │ Payment Service :8080 │
│ iptables intercept │ │ business logic │
└──────────────────────────┘ └──────────────────────────┘
└──────── mTLS ENCRYPTED WIRE ────────┘
The mesh has two planes:
- Data plane — the collection of sidecar proxies that handle actual traffic
- Control plane — Consul, Istio, or Linkerd, pushing config to every proxy: service discovery, routing rules, retry policies, certificates, and authorization policies
Key insight: The application never changes. No SDK. No library. No config file inside the app. The entire mesh is an infrastructure concern — developers just write business logic.
PART 03 — A Service-to-Service Call, Step by Step
When Order Service needs to call Payment Service, here is exactly what happens at the network level:
1. App calls localhost
Order Service calls http://payment-service:8080/charge — a plain HTTP call. The app has no idea what happens next.
2. iptables intercepts
A kernel-level iptables rule redirects all outbound TCP traffic from the app process to port 15001 — the Envoy sidecar — before it leaves the host.
3. Service registry lookup
Envoy queries Consul to find a healthy instance of payment-service. Consul returns the IP and port of a live, passing-health-check instance.
4. Policy check
Envoy checks the authorization policy: is order-service allowed to call payment-service? If no intention exists or the intention says deny, the connection is dropped here — before any app code runs.
5. mTLS connection established
Both sidecars perform a mutual TLS handshake. Each presents a certificate signed by the mesh's internal CA. Both sides verify the other's identity. The wire is encrypted. Neither application touches a certificate.
6. Request delivered
The Payment sidecar decrypts the request and delivers it to the Payment Service on localhost. From Payment's perspective, a plain HTTP request arrived. The full round trip adds ~1ms of latency.
Bonus — what you get for free: Every hop automatically emits a distributed tracing span to Jaeger/Zipkin. Retries, circuit breaking, and load balancing across healthy instances are all handled by the sidecar — configured once in the control plane, applied everywhere.
PART 04 — How mTLS Is Managed at Scale
This is where most explanations stop short. Manually managing certificates across 40+ services would be worse than the problem we started with. The mesh solves this with an internal Certificate Authority built into the control plane.
"Every service gets a short-lived cryptographic identity. No human ever touches a certificate."
The control plane acts as a Root CA. Every sidecar proxy receives what is called an SVID — a SPIFFE Verifiable Identity Document. SPIFFE is an open standard for workload identity. The certificate encodes who the service is, not where it lives:
spiffe://dominos.internal/ns/production/sa/order-service
Stable across reboots, IP changes, and redeployments. Tied to the service — not a hostname or IP address.
Root CA (Control Plane)
├── Intermediate CA (Datacenter East)
│ ├── order-service SVID ttl: 24h ↻ auto-rotates
│ ├── payment-service SVID ttl: 24h ↻ auto-rotates
│ └── makeline-service SVID ttl: 24h ↻ auto-rotates
└── Intermediate CA (Datacenter West)
└── inventory-service SVID ttl: 24h ↻ auto-rotates
Certificates are short-lived — typically 24 hours. The sidecar fetches a fresh certificate from the control plane before expiry. No downtime. No human involvement. If a cert is compromised, it expires fast — the blast radius is tiny.
Compare this to traditional PKI where certificates might be valid for two years, often forgotten, and discovered only when something breaks.
The authorization layer on top
mTLS proves who you are. Intentions (Consul) or AuthorizationPolicy (Istio) control what you're allowed to do. Even with a valid cert, a service cannot call another unless an explicit allow rule exists. The default is deny-all. This is enforced at the sidecar — the network drops the request before the target application sees a single byte.
# Who can call makeline-service?
Kind = "service-intentions"
Name = "makeline-service"
order-service → ALLOW (explicit caller)
pulse-service → ALLOW (explicit caller)
* → DENY (wildcard catch-all — must be last)
Evaluated top-to-bottom. First match wins.
PART 05 — Adding a New Service to the Mesh
The process is simpler than people expect — and most of it is automated. Two files. That's it.
service.hcl — opts the service into the mesh:
service {
name = "makeline-service"
port = 8080
check {
http = "http://localhost:8080/health"
interval = "10s"
}
connect {
sidecar_service {} # opts into the mesh
}
}
intentions.hcl — defines who can call it:
Kind = "service-intentions"
Name = "makeline-service"
Sources = [
{
Name = "order-service"
Action = "allow"
},
{
Name = "*"
Action = "deny"
},
]
Commit both files. CI runs consul config write before the service starts — access policy is live before the first request arrives.
After those two files are committed and the CI pipeline runs, this happens automatically:
- Intentions applied to Consul —
consul config write intentions.hcl - Consul launches Envoy sidecar —
consul connect envoy -sidecar-for makeline-service; Envoy receives its full routing config via the xDS API - CA issues SVID certificate — the service now has a cryptographic identity in the mesh
- iptables rules configured — all outbound traffic from the app is redirected through Envoy; application code is completely unchanged
- Service is live and secure — appears in Consul registry as healthy; only callers with explicit allow intentions can reach it
PART 06 — Who Owns What — The Governance Model
The most common mistake is treating service mesh config as an "ops problem." It isn't.
| Service Team — Writes | Platform Team — Runs | Security Team — Approves |
|---|---|---|
service.hcl definition | Consul cluster setup | Reviews intention PRs |
intentions.hcl (access policy) | Root CA management | Approves cross-domain access |
| Retry & timeout policy | Global deny-all baseline | Audit trail via ACL logs |
| Health check config | Envoy version upgrades | Does NOT write HCL |
| Lives in: service repo | Lives in: infra repo | Lives in: PR review |
Common mistake: Centralizing intention management in a shared "mesh config" repo owned by the platform team. This creates a bottleneck. The team that owns the service must own its access policy — they know its callers, they carry the on-call pager.
The Six Things Worth Remembering
→ Service mesh handles east-west (service-to-service). API Gateway handles north-south (client-to-service). Use both.
→ The sidecar proxy intercepts traffic via iptables — zero application code changes required.
→ mTLS is automatic. The CA issues short-lived SPIFFE SVIDs. No human manages a certificate.
→ Default-deny is the baseline. Nothing talks to anything until you explicitly allow it.
→ Define intentions before the service deploys — not after traffic starts failing.
→ Intentions are stored in Consul, not in the sidecar. They survive restarts. Define once, done.