Networking Essentials

The network facts that move your p99: TCP handshakes and slow start, TLS cost, DNS TTL traps, HTTP/1.1 vs 2 vs 3, head-of-line blocking, and where latency budgets die.

24 min readupdated 2026-06-28

On this page

Most “the service is slow” tickets are not slow code. They are a connection being rebuilt from scratch on every request, a DNS record nobody can flush, a TLS handshake burning CPU you didn’t budget, or a single TCP connection serializing a hundred responses behind one dropped packet. The network is not a transparent wire between your services. It is a stack of protocols, each with its own warm-up cost, its own caches, and its own distinctive way of falling over while every dashboard you own stays green.

I have watched a team spend a full week blaming their garbage collector for a p99 that only spiked on cellular traffic. The GC was fine. The problem was three layers down, in a protocol decision they made eighteen months earlier and never revisited. That incident is the spine of this article, because it captures the thing that makes networking hard: the symptom shows up in your application, but the cause lives in a layer your APM tool doesn’t even render.

This is the working mental model for the bytes between your services — enough to read a latency budget and know exactly where the milliseconds went, and enough to recognize a network failure before you’ve wasted a sprint profiling the wrong thing. It pairs tightly with Load Balancing, where these connections terminate and get reused, and Observability, where you have to instrument the connection lifecycle separately from request handling or you will be blind to half of this. The reliability side shows up again in API Design (idempotency, retries) and the coordination side in Consistency & Consensus.

The one belief I want to install: latency is denominated in round trips, not bytes. Bandwidth is cheap and getting cheaper; the speed of light is fixed. Almost every optimization that matters is really “how do I incur fewer round trips before useful data flows.” Hold that and the rest is detail.

A motivating failure

A consumer app ships a new mobile client. The backend is a clean gRPC service behind an HTTP/2 ingress — multiplexed streams, header compression, the modern stack everyone recommends. In the office, on the corporate Wi-Fi, on every synthetic test, it is fast. p50 is 40ms, p99 is 90ms. They ship.

Within days, support tickets pile up from users on the move: “app freezes on the train,” “spins forever in the elevator,” “fine on Wi-Fi, useless on 4G.” The dashboards disagree with the users. Server-side p50 is healthy. CPU is flat. The database is bored. But the client-reported p99 has quietly tripled to 300ms+, and only for cellular sessions, and only sometimes.

The team chases the application for a week. They blame the GC, then a slow downstream, then thread-pool starvation. They add tracing, and the traces look great — because the traces measure the server, and the server is innocent.

The real story is one protocol layer down. HTTP/2 multiplexes every concurrent request onto a single TCP connection. TCP guarantees in-order delivery. On a lossy cellular link with ~1% packet loss, when one packet carrying stream #4 is dropped, TCP holds packets #5 through #40 in the kernel buffer — already received, fully intact — refusing to hand any of them to the application until #4 is retransmitted one round trip later. Every concurrent request on that connection stalls behind one lost packet for one mobile-RTT. That is transport-layer head-of-line blocking, and HTTP/2 made it worse than the old protocol it replaced, because the old one spread requests across six independent connections.

Nobody wrote a bug. Every component did exactly what it promised. The outage lived in a protocol tradeoff the team never knew they’d accepted: HTTP/2 trades application-layer head-of-line blocking for transport-layer head-of-line blocking, and on a clean network that’s a pure win — but on a lossy one it concentrates the blast radius of a single dropped packet onto every request in flight. The fix was HTTP/3, which I’ll get to. The lesson is the one this whole article is built around: p50 healthy, p99 erratic, correlated with network quality, is a transport signature, and no amount of application profiling will find it.

The one-sentence mental model

Every request rides a layered stack where each layer adds a round trip and a cache you pay for unless you deliberately amortize both.

A round trip — one packet there, one acknowledgment back — is the atomic unit of network latency, and it is the number you should be counting. Inside one datacenter a round trip is ~0.5ms. Same-region, cross-AZ, it’s ~1–2ms. Cross-region (say us-east to eu-west) it’s 60–90ms. Transcontinental or satellite-adjacent paths run 150–250ms+. The speed of light in fiber is roughly 200,000 km/s, which is a hard floor nobody at any company gets to renegotiate. So the only lever you actually have is how many round trips you incur before the first useful byte arrives.

flowchart TB
  App[Application\nHTTP / gRPC] --> TLS[TLS\nencryption handshake]
  TLS --> TCP[TCP\nreliable ordered stream]
  TCP --> IP[IP\nrouting, best-effort]
  IP --> Link[Link\nEthernet / Wi-Fi]
  Link -. one RTT\nthere + back .-> Link

Unpack the sentence clause by clause, because each is an operational constraint:

A layered stack → a failure or a delay can hide at any boundary, and the layer that shows the symptom is usually not the layer with the cause. The OSI seven-layer model is interview trivia; in production you reason about four — link, IP (routing, packet loss), TCP (the reliable byte stream and its handshakes), and application (HTTP, gRPC, your protocol) — with TLS wedged between TCP and the application as its own handshake.
Each layer adds a round trip → a cold HTTPS request can cost a TCP handshake, a TLS handshake, and a DNS lookup before your handler runs. Cross-region, that’s easily 200ms of pure setup spent on zero bytes of payload.
And a cache → DNS is cached at four levels, TCP connections are pooled, TLS sessions are resumable. Every one of those caches is a performance win and a correctness hazard, because a stale cache points at the wrong place with total confidence.
Unless you deliberately amortize → none of this is free by default. Connection reuse, session resumption, and short TTLs are choices you make, and the teams that get paged are the ones who left the defaults on.

How it actually works

The TCP handshake and slow start

Before a single byte of your request moves, TCP completes a three-way handshake: SYN → SYN-ACK → ACK. That is one full round trip of pure setup before any data. Cross-region, that’s ~60ms you’ve spent saying hello.

Then slow start kicks in, and this is the part people forget. TCP has no idea how much bandwidth the path can carry, so it starts cautiously and probes upward. The initial congestion window (initcwnd) is ~10 packets on modern Linux — about 14KB of data — and it roughly doubles every round trip until it sees loss or hits the receiver’s window. So a 200KB response on a brand-new connection doesn’t fly out at line rate; it ramps: 14KB, then 28KB, then 56KB, and so on, each step costing a round trip.

sequenceDiagram
  participant C as Client
  participant S as Server
  C->>S: SYN
  S->>C: SYN-ACK
  C->>S: ACK (1 RTT gone, no data)
  C->>S: GET /resource
  S->>C: ~14KB (initcwnd ~10)
  Note over C,S: window doubles per RTT
  S->>C: ~28KB
  S->>C: ~56KB until loss or cap

The operational lesson is blunt: a brand-new connection is slow, and it is not your server’s fault. A request that would take 5ms on a warmed connection can take 60ms+ cold, entirely in handshake and ramp. This is precisely why connection reuse is the single highest-impact network optimization most teams skip — they pay the handshake-plus-slow-start tax on every request because their HTTP client defaults to closing connections, or their pool is too small to keep any warm.

You can inspect the initial window on a Linux route:

ip route show          # find the route
ip route change default via <gw> initcwnd 30 initrwnd 30

Raising initcwnd on a known-fat internal path (datacenter, dedicated interconnect) lets large internal responses skip several slow-start steps. Do not do this blindly on the public internet — you’ll just induce loss and make things worse.

Head-of-line blocking, in detail

TCP delivers bytes in order, no exceptions. If segment #3 is lost, segments #4 through #10 sit in the kernel’s receive buffer — received, ACKed at the IP level, completely intact — but undeliverable to your application until #3 is retransmitted, which takes at least one round trip (often more, governed by the retransmission timeout). That’s head-of-line (HOL) blocking at the transport layer: one lost packet stalls everything queued behind it on that connection.

On a clean wired path with 0.01% loss, you’ll basically never notice. On a cellular or cross-continent path with 1% loss, and a protocol that piles many requests onto one connection, it’s the difference between a healthy p50 and a p99 that makes users uninstall. This single property is why HTTP/3 abandoned TCP entirely and rebuilt reliable delivery on top of UDP — covered below.

TLS handshake cost

TLS adds its own negotiation on top of the TCP handshake. With TLS 1.2 that’s two extra round trips: cipher negotiation, key exchange, certificate exchange. TLS 1.3 cut a full new handshake to one round trip, and added 0-RTT resumption where a returning client sends application data in its very first packet using keys cached from a prior session.

Setup	Round trips before data	Notes
Plain TCP	1	handshake only
TCP + TLS 1.2	3	TCP (1) + TLS (2)
TCP + TLS 1.3	2	TCP (1) + TLS (1)
TLS 1.3 resumption	1 (or 0-RTT)	reuses prior session keys
QUIC (HTTP/3) new	1	transport + crypto combined
QUIC 0-RTT	0	data in first flight

Mandate TLS 1.3 and turn on session resumption. The difference is not academic: three round trips cross-region is ~180ms of latency before your handler runs, versus one for QUIC. There’s also a CPU dimension people ignore — the asymmetric crypto in a full handshake (an RSA or ECDHE key exchange) costs real cycles, and at high connection-churn rates a server can spend more CPU on handshakes than on actual work. Session resumption sidesteps the expensive asymmetric step, which is half the reason it matters.

DNS resolution and the TTL trap

DNS turns a name into an address, and it is cached at every layer: the OS stub resolver, the recursive resolver (your ISP’s or 8.8.8.8), and — the one that bites — the application runtime itself. The TTL (time-to-live) on a record dictates how long each layer is permitted to cache it.

flowchart LR
  App[App resolves\ndb.internal] --> Stub[OS stub\ncache]
  Stub --> Rec[Recursive\nresolver cache]
  Rec --> Auth[Authoritative\nTTL=30s]
  Auth -. cached at\nevery hop .-> App
  Fail[Failover:\nnew IP hidden] -. until all\ncaches expire .-> App

Here’s where it goes wrong. You fail a database over by repointing a DNS record to the standby’s IP. But the record had TTL 3600, so clients keep hammering the dead address for up to an hour. Worse — and I have personally lost an afternoon to this — many runtimes cache DNS independent of the record’s TTL. The classic offender is the JVM, which with the security setting networkaddress.cache.ttl=-1 caches a successful lookup for the entire life of the process. You can set a 30-second TTL on the record, fail over cleanly, and the JVM will cheerfully keep talking to the dead host until you restart every instance.

The fixes are concrete:

Set short TTLs (30–60s) on any record you intend to fail over.
Verify your runtime honors TTL. For the JVM, set networkaddress.cache.ttl=30 (or lower) explicitly; do not trust the default.
Prefer failover mechanisms that don’t depend on client DNS re-resolution at all — a load balancer or virtual IP in front, so the name stays stable and the indirection happens server-side.

DNS is also a single point of failure with a long blast radius. A misconfigured or DDoSed authoritative server takes down name resolution for everything behind it, which looks like a total outage even though every server is healthy. That’s why critical zones run on multiple independent providers.

The tradeoffs that bite

These are the decisions that look free at design time and bill you in an incident.

Tradeoff	The free-looking choice	What it actually costs
Connection reuse vs freshness	Pool and keep connections warm	A pooled conn to a failed-over host keeps hitting the dead IP until recycled
HTTP/2 multiplexing vs HOL	One conn, many streams	One lost packet stalls every stream (the opening story)
Short DNS TTL vs resolver load	`TTL 30s` for fast failover	More lookups, an extra resolution RTT more often
0-RTT vs replay safety	TLS 1.3 early data	0-RTT data is replayable; never send non-idempotent requests in it
TLS offload vs internal trust	Terminate TLS at the edge	Plaintext inside the perimeter unless you run mTLS in the mesh
Aggressive retries vs amplification	Retry on any failure	A retry storm turns a blip into a self-inflicted outage

Two of these deserve a second look. Connection reuse vs freshness is the subtle one: pooling is mandatory for performance, but a pool is a cache of TCP connections, and like every cache it can point at stale truth. After a failover, connections in the pool are still wired to the old host’s IP at the socket level — DNS doesn’t help, because the connection already resolved. You need a pool that caps connection lifetime (maxLifetime) and recycles, so failed-over hosts actually drain.

Aggressive retries vs amplification is the one that turns small problems into outages. When a downstream slows down, naive clients retry, which doubles or triples the load on the thing that was already struggling, which makes it slower, which triggers more retries. Always pair retries with exponential backoff and jitter, cap total attempts, and use a circuit breaker. This is a networking concern as much as an application one, and it connects directly to rate limiting and API design.

HTTP/1.1 vs 2 vs 3

	HTTP/1.1	HTTP/2	HTTP/3 (QUIC)
Transport	TCP	TCP	UDP + QUIC
Concurrency	1 req/conn at a time	multiplexed streams	multiplexed streams
HOL blocking	per connection	transport-level (shared TCP)	none (per-stream)
Header overhead	plaintext, repeated	HPACK compression	QPACK compression
Handshake	TCP + TLS separate	TCP + TLS separate	combined, 1-RTT / 0-RTT
Connection migration	no	no	yes (survives IP change)

HTTP/1.1’s original sin is one request per connection at a time. Pipelining was supposed to fix this and never worked reliably through real proxies, so browsers worked around it by opening six parallel connections per host — each paying its own handshake, its own slow start, its own TLS negotiation. HTTP/2 fixed concurrency properly with multiplexing: many logical streams over one connection, with HPACK header compression on top (a big deal when every request repeats kilobytes of cookies and headers).

But HTTP/2 inherited TCP’s in-order delivery, and so it inherited transport HOL blocking — exactly the trap in the opening story. HTTP/3 runs over QUIC, a reliable transport built on UDP, where each stream has independent ordering. A lost packet on stream A does not stall stream B, because QUIC tracks delivery per-stream instead of per-connection. QUIC also folds the transport and crypto handshakes into a single round trip, and supports connection migration: a phone moving from Wi-Fi to cellular keeps the same QUIC connection (identified by a connection ID, not the 4-tuple of IPs and ports), so the session survives the network change instead of resetting.

flowchart TB
  L[1 packet lost] --> H2{HTTP/2\nover TCP}
  H2 --> B2[ALL streams\nstall 1 RTT]
  L --> H3{HTTP/3\nover QUIC}
  H3 --> B3[only that stream\nwaits, others flow]

The honest caveat: QUIC lives in userspace, not the kernel, so it can be more CPU-hungry per byte, and some corporate networks or middleboxes block or throttle UDP. HTTP/3 is a clear win for lossy and mobile clients; for clean internal datacenter links, HTTP/2 is usually plenty and simpler to operate.

Latency: round trips vs bandwidth

This is the section to internalize, because it reframes most performance work. For small payloads — API calls, gRPC messages, the typical request/response — you are latency-bound, not bandwidth-bound. Upgrading a link from 1 Gbps to 10 Gbps does nothing for a 2KB JSON response; the bytes were never the bottleneck. The round trips were.

Do the arithmetic on a cold cross-region HTTPS request to eu-west from us-east at ~75ms RTT:

DNS lookup (uncached): 1 RTT → 75ms
TCP handshake: 1 RTT → 75ms
TLS 1.3 handshake: 1 RTT → 75ms
Request + first byte: 1 RTT → 75ms
Total before payload: ~300ms, on a request whose actual processing might be 5ms.

Now warm everything: DNS cached, connection pooled and past slow start, TLS session resumed. The same request costs 1 RTT — 75ms — plus the 5ms of work. You cut 225ms without touching a single line of application code, purely by not re-paying setup costs. That is the whole game.

The levers, in rough order of impact:

Connection reuse (keep-alive + pooling). Amortizes handshake and keeps you past slow start. Biggest single win, almost always.
Reduce round trips in the protocol. Batch calls, use a single multiplexed connection, avoid chatty request/response ping-pong. One request that returns everything beats ten that each cost an RTT.
Move the endpoint closer. A CDN edge POP 5ms away beats any optimization on a 150ms origin path. The fastest round trip is the short one.
TLS 1.3 + resumption. Removes a round trip and the expensive asymmetric crypto on warm paths.
HTTP/3 on lossy paths. Removes HOL blocking, which is a p99 win specifically, not a p50 one.

Throughput (bandwidth) only becomes the bottleneck for large transfers — file uploads, media, backups, replication streams. There, slow start and the bandwidth-delay product matter, and initcwnd tuning and modern congestion control (BBR instead of the older CUBIC) earn their keep. But for the request/response traffic that dominates most systems, count round trips first and bytes second.

Failure modes

The network failures that actually page people. Each is symptom → root cause → prevention.

Connection churn. Symptom: p99 dominated by connect/handshake time; server CPU burning on TLS; throughput far below capacity. Root cause: no keep-alive, or an undersized pool, so every request pays a fresh handshake plus slow start. Prevention: enable HTTP keep-alive everywhere, size pools to peak concurrency (not request rate), and measure handshake count as a distinct metric.

Ephemeral port / TIME_WAIT exhaustion. Symptom: cannot assign requested address or connection refused under load; works fine at low traffic. Root cause: a client opening thousands of short-lived connections runs out of source ports (the default ip_local_port_range gives ~28k) or piles up sockets stuck in TIME_WAIT for 60s each. Prevention: reuse connections (this disappears with pooling), widen the port range (net.ipv4.ip_local_port_range), enable net.ipv4.tcp_tw_reuse=1, and watch ss -s socket counts.

DNS failover lag. Symptom: traffic keeps hitting a host you failed away from, long after the cutover. Root cause: a long record TTL, or a runtime caching DNS for its process lifetime regardless of TTL (the JVM trap). Prevention: short TTLs on failover records, explicit runtime DNS TTL config, and server-side indirection (a stable VIP/load balancer) so clients never re-resolve at all.

HOL blocking under loss. Symptom: p50 healthy, p99 erratic, correlated with mobile or cross-region traffic (the opening story). Root cause: one dropped packet stalls every stream on a multiplexed HTTP/2 (TCP) connection. Prevention: HTTP/3 for lossy/mobile clients; for HTTP/2, accept that a tiny loss rate amplifies on p99 and instrument accordingly.

MTU / fragmentation black holes. Symptom: small requests succeed, large payloads hang forever, often only across a VPN or tunnel. Root cause: a path with a smaller MTU silently drops oversized packets when the DF (don’t-fragment) bit is set, and the ICMP “fragmentation needed” message that should signal this is filtered by a firewall. The sender never learns to shrink its packets. Prevention: validate path MTU across tunnels, ensure ICMP type 3 isn’t blanket-filtered, and use MSS clamping on tunnel interfaces.

If your p50 latency is healthy but p99 is erratic on cross-region or mobile traffic, suspect packet loss amplified by head-of-line blocking before you touch application code. A 0.1% loss rate is invisible to p50 and brutal to p99 on a multiplexed TCP connection. I have watched a week of engineering time vanish into profiling a server that was never the problem — the cause was three layers down, in a protocol choice. Instrument the connection layer separately, or you will keep looking where the light is good instead of where the keys are.

Scaling it

What changes as traffic grows, in the order you’ll hit each wall.

Connection pooling is the first and most important move. Keep connections warm and reused so you amortize handshakes and stay past slow start. Size the pool to peak concurrency, cap idle and total connection lifetime so failed-over hosts drain, and remember a pool per instance multiplied by hundreds of instances can overwhelm a backend’s connection limit — coordinate pool sizing with the downstream’s connection model.

Terminate TLS at the edge. Offload TLS at the load balancer or a sidecar so origin servers either speak cheap plaintext internally or hold long-lived mTLS mesh connections that are negotiated once and reused for the life of the pod. This concentrates the expensive crypto where you can scale it horizontally and frees origins to do work.

Push content to the edge. A CDN terminates the connection 5ms from the user at an edge POP instead of forcing a transcontinental round trip to origin. Static assets, cacheable API responses, and increasingly TLS termination all move outward. This is the highest-leverage latency win for a geographically spread user base — you’re not making the network faster, you’re making it shorter.

Anycast for the front door. A single IP announced from many locations (via BGP) routes each client to the topologically nearest POP automatically. It’s how global load balancers and DNS providers give everyone a “local” entry point, and how DDoS traffic gets absorbed across many sites instead of one.

flowchart TB
  U1[User\nTokyo] --> A[Anycast IP\none address]
  U2[User\nLondon] --> A
  A --> P1[Edge POP\nTokyo]
  A --> P2[Edge POP\nLondon]
  P1 --> O[Origin\nregion]
  P2 --> O

Tune congestion control for big internal flows. On high-bandwidth, long-distance internal links (cross-region replication, backups), switch from CUBIC to BBR congestion control and raise initcwnd. BBR models the path’s bandwidth and RTT directly instead of treating any loss as congestion, which can multiply throughput on lossy long-haul links. Don’t apply it reflexively to public-facing short connections, where it buys little.

Budget by round trips at every layer. At scale, the cheapest request is the one that never crosses a region. Co-locate chatty services, collapse multi-call sequences into one, and treat a cross-region round trip as the expensive resource it is.

When to reach for it (and when not to)

Networking is foundational, not optional — but the protocol choices are real decisions with real tradeoffs.

Reach for HTTP/2 for internal service-to-service traffic and most public APIs on reliable paths. Multiplexing kills per-request handshake cost and header compression is a clear win when requests repeat large headers.

Reach for HTTP/3 / QUIC when clients are on lossy or mobile networks, or when connection migration matters — a mobile app whose users switch between Wi-Fi and cellular mid-session, or any traffic where p99 under packet loss is the metric you’re judged on.

Reach for gRPC (which rides HTTP/2) for internal RPC where you want streaming, strong typing, and multiplexing, and you control both ends.

Stay on HTTP/1.1 only where tooling, legacy proxies, or a hard simplicity requirement demand it — and then lean hard on keep-alive so you’re not re-handshaking constantly.

Don’t open a new connection per request, don’t trust a runtime to honor DNS TTL without checking it explicitly, don’t assume HTTP/2 cured head-of-line blocking (it relocated it to TCP), and don’t retry without backoff and jitter.

When to consider alternatives

The jobs adjacent to raw networking, and where they live on this site:

Where connections terminate, get balanced, and fail over → Load Balancing.
Measuring connection lifecycle, RTT, and loss as first-class signals → Observability.
Protecting against retry storms and abusive clients → Rate Limiting.
Idempotency, retries, versioning, and request design → API Design.
A single entry point that routes, authenticates, and shapes traffic → API Gateway.
Serving content from the edge to shorten the round trip → Object Storage + CDN.
The consistency tradeoffs behind cross-region replication → Consistency & Consensus and CAP Theorem.

Operational checklist

Enforce TLS 1.3 and enable session resumption; measure handshake count and handshake CPU, not just request count.
Enable HTTP keep-alive everywhere; size connection pools to peak concurrency and cap idle and total connection lifetime so failed-over hosts drain.
Set DNS TTL to 30–60s on anything you fail over, and verify the runtime honors it (check JVM networkaddress.cache.ttl); prefer a stable VIP over client re-resolution.
Alert on connect/handshake time as a distinct slice of p99, separate from server processing time.
Watch ephemeral port usage and TIME_WAIT counts (ss -s) on chatty clients; widen ip_local_port_range and enable tcp_tw_reuse if needed.
Validate path MTU across VPNs and tunnels; ensure ICMP “fragmentation needed” is not filtered, and clamp MSS on tunnel interfaces.
Default new internal and public services to HTTP/2; pilot HTTP/3 for mobile-heavy or lossy traffic and judge it on p99.
Pair every retry with exponential backoff plus jitter, a cap on attempts, and a circuit breaker to prevent amplification.
For large cross-region flows, evaluate BBR congestion control and a raised initcwnd on known-fat internal paths only.

Summary

The network is a layered stack, and the layer that shows the symptom is rarely the layer with the cause. Latency is denominated in round trips, not bytes, and a cold connection pays a tax — DNS lookup, TCP handshake, TLS handshake, slow start — that a warm one skips entirely, which is why connection reuse is the optimization that beats almost everything else. TLS 1.3 with resumption removes round trips and crypto cost; DNS TTL is a cache that will happily route you to a dead host across a failover, especially through a runtime that caches forever. HTTP/2 fixed concurrency but moved head-of-line blocking down to TCP, where a single dropped packet stalls every stream — the trap behind the “p50 fine, p99 wild on mobile” signature — and HTTP/3 over QUIC fixes it by giving each stream independent delivery. Count round trips, keep connections warm, mandate TLS 1.3, watch your TTLs, retry with backoff, and instrument the connection layer separately from request handling. Do that and most “the service is slow” tickets resolve before they reach the application code you were about to blame.

Appendix: the layers and the units

A quick refresher so the body can stay advanced.

Latency — time for one round trip; the floor is set by distance and the speed of light in fiber (~200,000 km/s). You reduce round trips, never the speed of light.
Bandwidth — bytes per second the link can carry; matters for big transfers, irrelevant for small request/response.
The four layers you actually reason about — link (Ethernet/Wi-Fi frames), IP (best-effort routing of packets, where loss happens), TCP (reliable, ordered byte stream built on top of IP, with handshakes and congestion control), application (HTTP/gRPC/your protocol). TLS sits as a handshake between TCP and the application.
RTT vs throughput — small payloads are latency-bound (count round trips), large payloads are throughput-bound (mind slow start and the bandwidth-delay product).
The 4-tuple — a TCP connection is identified by (source IP, source port, dest IP, dest port). Ephemeral source ports are finite, which is why connection churn exhausts them. QUIC uses a connection ID instead, which is what lets it survive an IP change.

Incidents & deep-dives

Where this system breaks in production — and how it comes back.

No incident deep-dives yet. See the roadmap for what's coming.