Multiplexing allows simultaneous use of multiple TCP connections without mixing with each other. By using a single TCP connection to send and receive multiple HTTP requests instead of opening a new connection for each request, it is possible to significantly reduce the processing load and latency to meet high network traffic loads.
Internet traffic is dominated by the exchange of HTTP (application-layer protocol) messages using the TCP connection-oriented protocol. A TCP connection is established using a 3-way handshake, which produces processing load, and involves various RTT inducing latency.
To reduce the overhead of establishing and closing TCP connections for each HTTP request, persistent connections (a.k.a. HTTP Keepalive) are used by default for HTTP/1.1. By keeping open TCP connections and reusing them for subsequent HTTP requests, we can get a reduced consumption of resources in web servers and network devices, as well as lower latency between HTTP messages.
Persistent connections are constantly monitored at any HTTP/1.1 standard implementation to respond appropriately to any connection from client or server termination signal. In case of temporary overloads it is highly advisable to maintain persistent connections to avoid network congestion by the load of opening and closing TCP connections for each HTTP request.
This new version (based in SPDY) enables a more efficient use of network resources implementing multiplexing of requests and responses to avoid the head-of-line-blocking problem found in HTTP/1.x. It allows multiple concurrent exchanges of HTTP messages on the same TCP connection and request pipelining without FIFO restrictions. Each HTTP request/response exchange is associated with its own stream, which are independent of each other, so neither of them prevent progress on other streams in case one of them becomes blocked or stalled.
The basic protocol unit in HTTP/2 are frames. It enables a more efficient processing of messages through the use of binary message framing.
The number of simultaneous connections that a web server can maintain open is limited, whether for a maximum number of sockets or memory used by their respective buffers.
The more connections kept open on a web server and the greater number of requests received in the same space of time, increases the overall response time of the server, so that operating close to the maximum number of supported connections can affect overall web application performance.
An appropriate balance must exist between the cost-benefit of maintaining open connections in a web server. Because natively a TCP connection must be terminated explicitly, it is necessary to handle certain policies to control concurrent connections, either by specifying a maximum number of connections, a maximum number of requests per connection and a fixed retention time for each connection (timeout).
By using a Content Delivery Network (CDN) between clients and origin servers, you can maximise the benefits of HTTP Multiplexing even more by re-using already established server connections for multiple distinct clients connections. A single TCP connection between a CDN node with an origin server can deal with HTTP requests from different users, meaning less open connections at the origin.
Origin servers have a limited capacity to maintain simultaneous connections. A CDN regulates web traffic that pass to the origin using faster connections and re-using already established connections, allowing the origin to have more capacity to manage more concurrent users.
Users connections can saturate a web server capacity due to each client requires a dedicated process that lives on the web server while the TCP connection remains established. The situation becomes even worse when clients have slow connections and they should remain active while the web servers deals with them. Servers would be hanging around with connections tied up to users.
Allowing a CDN to deal with users' slow connections and fetch responses from origin servers is a much clever approach, leaving space to serve more concurrent users from the CDN with fewer connections to origin servers. This reduces the overhead of manage user connections from the origin and offloads the task of dealing with client network connections away from the web servers. A CDN can obtain a complete and valid request from the client before sending it to the origin servers, using a new faster connection or an already established one. With greater pipes to connect to an origin server, a CDN can fetch responses faster than a user would with a slow connection to the origin server, not tying up the origin for much longer time like a regular client would.
Web application servers response time is directly related (exponentially) to the number of requests they have to manage at the same time. The more requests addressed in the same time interval, it will be slower to process each one of them. Leaving a CDN leverage its caching capabilities for web applications, would substantially reduce requests at origin servers.
Serving the same number of HTTP requests with fewer open TCP connections in origin servers, leave space to open more connections to serve additional user requests. It also allows to reduce the total number of servers needed to handle the same number of users, translating this in an improved per web server capacity.
Requiring less web servers leaves space for growth without making future investments or to redistribute already existing hardware for other uses. This leads to lower total cost of ownership (TCO) and reduces operating and management expenses. Also reduces IT complexity avoiding to maintain a larger infrastructure to support unexpected traffic load.
May 17, 2016 by Fernando Garza