Monday, April 15, 2013


Sensible Defaults for Apache HttpClient

Defaults for HttpClient

Before coming to Bizo, I wrote a web service client that retrieved daily XML reports over HTTP using the Apache DefaultHttpClient. Everything went fine until one day the connection simply hung forever. We found this odd because we had set the connection timeout. It turned out we also needed to set the socket timeout (HttpConnectionParams.SO_TIMEOUT). The default for both connection timeout (max time to wait for a connection) and socket timeout (max time to wait between consecutive data packets) is infinity. The server was accepting the connection but then not sending any data so our client hung forever without even reporting any errors. Rookie mistake, but everyone is a rookie at least once. Even if you are an expert with HttpClient, chances are there will be someone maintaining your code in the future who is not.

Another problem with defaults using HttpClient is with PoolingClientConnectionManager. PoolingClientConnectionManager has two attributes: MaxTotal and MaxPerRoute. MaxTotal is the maximum total number of connections in the pool. MaxPerRoute is the maximum number of connections to a particular host. If the client attempts to make a request and either of these maximums have been reached, then by default the client will block until a connection is free. Unfortunately the default for MaxTotal is 20 and the default MaxPerRoute is only 2. In a SOA, it is common to have many connections from a client to a particular host. The limit of 2 (or even 1) connections per host makes sense for a polite web crawler, but in a SOA, you are likely going to need a lot more. Even the 20 maximum total connections in the pool is likely much lower than desired.

If the client does reach the MaxPerRoute or the MaxTotal connections, it will block until the connection manager timeout (ClientPNames.CONN_MANAGER_TIMEOUT) is reached. This timeout controls how long the client will wait for a connection from the connection manager. Fortunately, if this timeout is not set directly, it will default to the connection timeout if that is set, which will prevent the client from queuing up requests indefinitely.

What would a better set of defaults be?

A good default is something that is "safe". A safe default for a connection timeout is long enough to not give up waiting when things are working normally, but short enough to not cause system instability when the is down. Unfortunately safe is context dependent. Safe for a daily data sync process and safe for an in thread service request handler are very different. Safe for a request that is critical to the correct functioning of the program is different than safe for a some ancillary logging that is ok to miss 1% of the time. A default for timeouts that is safe in all cases is not really possible.

Safe defaults for PoolingClientConnectionManager's MaxTotal and MaxPerRoute should be big enough that they won’t be hit unless there is a bug. New to version 4.2 is the fluent-hc API for making http requests. This uses a PoolingClientConnectionManager with defaults of 200 MaxTotal and 100 MaxPerRoute. We are using these same defaults for all our configurations.

Note that the fluent-hc API is very nice, but requires setting the connection timeouts on each request. This is perfect if you need to tune the settings for each request but does not provide a safety check against accidentally leaving the timeout infinite.

How can you help out a new dev implementing a new HTTP client?

If you can't have a safe default and the existing defaults are decidedly not safe, then it is best to require a configuration. We created a wrapper for PoolingClientConnectionManager that requires the developer to choose a configuration instead of letting the defaults silently take effect. One way to require a configuration is to force passing in the timeout values. However, it can be a hard to know the right values especially when stepping into a new environment. To help a developer implementing a new client at Bizo, we created some canonical configurations in the wrapper based on our experience working in our production environment on AWS. The configurations are:

Configuration
Connection timeout
Socket timeout
MaxTotal
MaxPerRoute
SameRegion
125 ms
125 ms
200
100
SameRegionWithUSEastFailover
1 second
1 second
200
100
CrossRegion
10 seconds
10 seconds
200
100
MaxTimeout
1 minute
5 minutes
200
100

Clients with critical latency requirements can use the SameRegion configuration and need to make sure they are connecting to a service in the same AWS region. Back end processes that can tolerate latency can use the MaxTimeout configuration. Now when a developer is implementing a new client, the timeouts used by other services are readily available without having to hunt through other code bases. The developer can compare these with the current use case and choose an appropriate configuration. Additionally, if we learn that some of these configurations need to be tweaked, then we can easily modify all affected code.

Commonly the socket timeout will need to be adjusted for a specific service. After a connection is established, a service will not typically start sending its response until it has finished whatever calculation was requested. This can vary greatly even for different parameters on the same service endpoint. The socket timeout will need to be set based on the expected response times of the service.

It is easy to miss a particular setting even if you know it is there. At Bizo, we are always looking for ways to solve a problem in one place. We are hopeful that this will eliminate any issues we have had with bad defaults in our HttpClients.

No comments: