Avoiding head-of-line blocking with HTTP/2 multiplexing

One of the more counter-intuitive lessons I picked up while running an edge proxy fleet is that HTTP/2 sometimes makes tail latency worse, not better, despite multiplexing being its headline feature. The reason is head-of-line blocking, and it shows up in two different places depending on which layer you are looking at. Sorting them out properly took me a few production incidents and a lot of tcpdump.

Under HTTP/1.1 each request occupies a TCP connection until the response is finished. Pipelining was supposed to fix this but was effectively dead on arrival because intermediaries handled it inconsistently. So browsers and clients opened multiple connections per origin, six per host being the typical limit. A slow response on one connection only blocked itself, which was a kind of crude isolation. HTTP/2 collapsed all of that into a single connection with stream multiplexing on top, and that is where the trouble starts.

What HTTP/2 fixes and what it does not

Inside a single HTTP/2 connection, streams are interleaved as frames. A slow application response no longer blocks other in-flight responses, because the server can send a HEADERS frame for stream 5 while it is still computing the body for stream 3. This is real, measurable HOL avoidance at the application layer. For an API gateway serving lots of small JSON responses, the connection count drops by an order of magnitude and the per-request overhead drops with it.

What HTTP/2 cannot fix is HOL blocking at the TCP layer. The single connection still runs over a single ordered byte stream. If one TCP segment is dropped, every byte that comes after it has to wait for retransmission before the kernel will deliver any of it to userspace. That includes frames belonging to streams that have nothing to do with the lost segment. On a clean wired network this is invisible. On a flaky mobile link, or on a path that crosses a congested transit provider, it is brutal: a 1% packet loss can turn a fast API call into a one second stall because it is stuck behind an unrelated response on the same connection.

QUIC was built for this

QUIC, which underpins HTTP/3, moves the stream abstraction down into the transport itself. Each stream has its own ordering and flow control, so packet loss on one stream does not block delivery on the others. The packet that was lost still has to be retransmitted, but the receiver can hand the other streams' bytes to userspace immediately. In practice this turns the tail latency story upside down for lossy clients: I have seen p99 page load times improve by 20-30% on mobile networks after enabling HTTP/3, with no change on desktop where loss is rare.

The trade-off worth understanding is connection coalescing. Browsers happily share a single HTTP/2 connection across multiple origins if they resolve to the same IP and the certificate covers both names. That saves handshakes and lets prioritization work across the whole document, but it concentrates everything on one TCP connection. If you serve a CDN with hundreds of small assets over a lossy link, that coalescing magnifies the impact of any single packet loss. The cleaner the path, the more coalescing wins; the dirtier the path, the more you want either HTTP/3 or, paradoxically, more parallel connections.

Tuning nginx for the right concurrency

The other knob worth tuning is the per-connection stream limit. The default in most servers is conservative because each open stream consumes memory and scheduling time. For a backend that does serious fan-out and benefits from many concurrent in-flight requests on one connection, raising it pays for itself. Here is a stripped-down nginx config from one of our edge tiers:

http {
    http2_max_concurrent_streams 256;
    http2_max_field_size 16k;
    http2_max_header_size 32k;
    http2_recv_timeout 30s;
    http2_idle_timeout 3m;

    keepalive_timeout 75s;
    keepalive_requests 10000;

    server {
        listen 443 ssl http2;
        listen 443 quic reuseport;
        add_header Alt-Svc 'h3=":443"; ma=86400';

        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_prefer_server_ciphers off;

        location /api/ {
            proxy_http_version 1.1;
            proxy_set_header Connection "";
            proxy_pass http://upstream_api;
        }
    }
}

Two notes on this config. First, http2_max_concurrent_streams at 256 is high; do not copy it blindly. Each stream allocates memory and a worker can be tied up servicing one slow client. For a typical web app I would leave it at the default of 128. Second, the Alt-Svc header is how you advertise HTTP/3 availability without forcing it on every client. Browsers that support it will upgrade on subsequent requests, while older clients keep using HTTP/2 happily.

How I decide which transport to push

My rough rule is to ship HTTP/2 to everyone and HTTP/3 to clients whose round-trip path is likely to be lossy. For a B2B API where every client is a server in a known datacenter, HTTP/2 is plenty and the operational simplicity is worth more than the marginal latency improvement. For anything serving phones, the calculus flips: enabling HTTP/3 there has been one of the highest-ROI infrastructure changes I have made in the last couple of years. Just remember that you still need HTTP/2 as a fallback, because QUIC is blocked on a surprising number of corporate networks that only allow TCP/443 outbound. The transport story is rarely all-or-nothing.