Service-to-Service Communication in a Microservices World

In a monolith, services talk by calling a method. In microservices, they talk across a network — and that changes everything. Security, reliability, observability, and identity all become your problem. Here's how modern architectures solve this, end to end.

PART 01 — The Problem With Point-to-Point

Imagine 10 services that each need to call some subset of the others. Without any governance, you end up with point-to-point connections where every service implements its own retry logic, TLS setup, circuit breaking, and authentication. Ten services means potentially 90 direct connections — each a custom, one-off integration.

The anti-pattern looks like this:

SERVICE A                          SERVICE B (duplicate)
+ retry logic                      + retry logic
+ TLS setup code                   + TLS setup code
+ circuit breaker                  + circuit breaker
+ auth & token validation          + auth & token validation
+ distributed tracing              + distributed tracing

Every team duplicates the same cross-cutting concerns in application code.

The solution is to move all of that cross-cutting infrastructure out of application code and into a dedicated layer. There are two main ways to do this:

API Gateway — for north-south traffic (client to service)
Service Mesh — for east-west traffic (service to service)

Most mature architectures use both.

PART 02 — The Service Mesh — How It Actually Works

A service mesh solves east-west communication by deploying a lightweight sidecar proxy (Envoy) alongside every service instance. The proxy intercepts all network traffic — inbound and outbound — without the application knowing it exists.

The app thinks it's calling http://payment-service:8080. The OS intercepts that packet via iptables rules and sends it to the local Envoy proxy instead.

VM / HOST 1                              VM / HOST 2
┌──────────────────────────┐            ┌──────────────────────────┐
│  Order Service :8080     │            │  Envoy Sidecar :15001    │
│  business logic          │            │  iptables intercept      │
│         ↓ localhost      │            │         ↓ localhost      │
│  Envoy Sidecar :15001    │            │  Payment Service :8080   │
│  iptables intercept      │            │  business logic          │
└──────────────────────────┘            └──────────────────────────┘
            └──────── mTLS ENCRYPTED WIRE ────────┘

The mesh has two planes:

Data plane — the collection of sidecar proxies that handle actual traffic
Control plane — Consul, Istio, or Linkerd, pushing config to every proxy: service discovery, routing rules, retry policies, certificates, and authorization policies

Key insight: The application never changes. No SDK. No library. No config file inside the app. The entire mesh is an infrastructure concern — developers just write business logic.

PART 03 — A Service-to-Service Call, Step by Step

When Order Service needs to call Payment Service, here is exactly what happens at the network level:

1. App calls localhost

Order Service calls http://payment-service:8080/charge — a plain HTTP call. The app has no idea what happens next.

2. iptables intercepts

A kernel-level iptables rule redirects all outbound TCP traffic from the app process to port 15001 — the Envoy sidecar — before it leaves the host.

3. Service registry lookup

Envoy queries Consul to find a healthy instance of payment-service. Consul returns the IP and port of a live, passing-health-check instance.

4. Policy check

Envoy checks the authorization policy: is order-service allowed to call payment-service? If no intention exists or the intention says deny, the connection is dropped here — before any app code runs.

5. mTLS connection established

Both sidecars perform a mutual TLS handshake. Each presents a certificate signed by the mesh's internal CA. Both sides verify the other's identity. The wire is encrypted. Neither application touches a certificate.

6. Request delivered

The Payment sidecar decrypts the request and delivers it to the Payment Service on localhost. From Payment's perspective, a plain HTTP request arrived. The full round trip adds ~1ms of latency.

Bonus — what you get for free: Every hop automatically emits a distributed tracing span to Jaeger/Zipkin. Retries, circuit breaking, and load balancing across healthy instances are all handled by the sidecar — configured once in the control plane, applied everywhere.

PART 04 — How mTLS Is Managed at Scale

This is where most explanations stop short. Manually managing certificates across 40+ services would be worse than the problem we started with. The mesh solves this with an internal Certificate Authority built into the control plane.

"Every service gets a short-lived cryptographic identity. No human ever touches a certificate."

The control plane acts as a Root CA. Every sidecar proxy receives what is called an SVID — a SPIFFE Verifiable Identity Document. SPIFFE is an open standard for workload identity. The certificate encodes who the service is, not where it lives:

spiffe://dominos.internal/ns/production/sa/order-service

Stable across reboots, IP changes, and redeployments. Tied to the service — not a hostname or IP address.

Root CA (Control Plane)
├── Intermediate CA (Datacenter East)
│   ├── order-service    SVID  ttl: 24h  ↻ auto-rotates
│   ├── payment-service  SVID  ttl: 24h  ↻ auto-rotates
│   └── makeline-service SVID  ttl: 24h  ↻ auto-rotates
└── Intermediate CA (Datacenter West)
    └── inventory-service SVID ttl: 24h  ↻ auto-rotates

Certificates are short-lived — typically 24 hours. The sidecar fetches a fresh certificate from the control plane before expiry. No downtime. No human involvement. If a cert is compromised, it expires fast — the blast radius is tiny.

Compare this to traditional PKI where certificates might be valid for two years, often forgotten, and discovered only when something breaks.

The authorization layer on top

mTLS proves who you are. Intentions (Consul) or AuthorizationPolicy (Istio) control what you're allowed to do. Even with a valid cert, a service cannot call another unless an explicit allow rule exists. The default is deny-all. This is enforced at the sidecar — the network drops the request before the target application sees a single byte.

# Who can call makeline-service?
Kind = "service-intentions"
Name = "makeline-service"

order-service   → ALLOW   (explicit caller)
pulse-service   → ALLOW   (explicit caller)
*               → DENY    (wildcard catch-all — must be last)

Evaluated top-to-bottom. First match wins.

PART 05 — Adding a New Service to the Mesh

The process is simpler than people expect — and most of it is automated. Two files. That's it.

service.hcl — opts the service into the mesh:

service {
  name = "makeline-service"
  port = 8080

  check {
    http     = "http://localhost:8080/health"
    interval = "10s"
  }

  connect {
    sidecar_service {}   # opts into the mesh
  }
}

intentions.hcl — defines who can call it:

Kind = "service-intentions"
Name = "makeline-service"

Sources = [
  {
    Name   = "order-service"
    Action = "allow"
  },
  {
    Name   = "*"
    Action = "deny"
  },
]

Commit both files. CI runs consul config write before the service starts — access policy is live before the first request arrives.

After those two files are committed and the CI pipeline runs, this happens automatically:

Intentions applied to Consul — consul config write intentions.hcl
Consul launches Envoy sidecar — consul connect envoy -sidecar-for makeline-service; Envoy receives its full routing config via the xDS API
CA issues SVID certificate — the service now has a cryptographic identity in the mesh
iptables rules configured — all outbound traffic from the app is redirected through Envoy; application code is completely unchanged
Service is live and secure — appears in Consul registry as healthy; only callers with explicit allow intentions can reach it

PART 06 — Who Owns What — The Governance Model

The most common mistake is treating service mesh config as an "ops problem." It isn't.

Service Team — Writes	Platform Team — Runs	Security Team — Approves
`service.hcl` definition	Consul cluster setup	Reviews intention PRs
`intentions.hcl` (access policy)	Root CA management	Approves cross-domain access
Retry & timeout policy	Global deny-all baseline	Audit trail via ACL logs
Health check config	Envoy version upgrades	Does NOT write HCL
Lives in: service repo	Lives in: infra repo	Lives in: PR review

Common mistake: Centralizing intention management in a shared "mesh config" repo owned by the platform team. This creates a bottleneck. The team that owns the service must own its access policy — they know its callers, they carry the on-call pager.

The Six Things Worth Remembering

→ Service mesh handles east-west (service-to-service). API Gateway handles north-south (client-to-service). Use both.

→ The sidecar proxy intercepts traffic via iptables — zero application code changes required.

→ mTLS is automatic. The CA issues short-lived SPIFFE SVIDs. No human manages a certificate.

→ Default-deny is the baseline. Nothing talks to anything until you explicitly allow it.

→ Define intentions before the service deploys — not after traffic starts failing.

→ Intentions are stored in Consul, not in the sidecar. They survive restarts. Define once, done.