I lead a team of talented developers and we’ve been working on a few Node.js projects. We already had one project in production (code named coltrane) and had arrived at the point of deploying a second project (code named maps). This second project makes use of websockets using the socket.io library. It was at this point that I unexpectedly found the need to put on my network architecture hat.
Our current setup uses Nginx for SSL termination and load balancing.
+-------------+ +----------------------------+ https://dev-coltrane +-------->| nginx - 443 |+------->| nodes.js - coltrane - 8000 | +-------------+ +----------------------------+
Much to my surprise, even the latest development branch of Nginx does not properly support websockets. Additionally, with the deployment of the second project, I want to make sure that we have proper routing while both projects coexist on the same machine. Finally, I want to ensure that we have a manageable scale path for when these projects ultimately get separated to different machines and require additional nodes for load balancing.
Digging around, I found the excellent post on the subject of websockets and Node.js here. It uses a mix of Stunnel, Varnish and Nginx along with Node.js. I followed the instructions and configured the coltrane and maps projects accordingly.
+----------------+ +--------------+ http | node.js - 8000 | https://dev-coltrane +--------------+ | |+------->| coltrane | | | http | | +----------------+ +---------------+ | |+-------->| nginx - 8080 | +-------->| stunnel - 443 |+--->| varnish - 80 | | | http +----------------+ +---------------+ | | | |+------->| node.js - 3000 | | |+-----+ +--------------+ | maps | https://dev-maps +--------------+ | +----------------+ | ^ | websocket | +-----------------------------------+
This setup works well and provides a reasonable scale path. However, the configuration seemed overly complex. We also lose the ability to load balance the WebSocket connections.
Looking further, it seemed that I should be able to simplify the configuration and meet all the goals for the deployment using Stunnel and HAProxy. Below is a diagram of the configuration that I came to.
+----------------+ https://dev-coltrane +--------------+ | node.js - 8000 | | |+------------------------->| coltrane | +---------------+ | | http +----------------+ +-------->| stunnel - 443 |+------->| haproxy - 80 |+------------+ +---------------+ | | | +----------------+ | |+------+ +------------>| | https://dev-maps +--------------+ | http | node.js - 3000 | | | maps | +------------------>| | websocket +----------------+
Later, I’ll review the configuration files, but first, let’s review a few pros and cons of this configuration.
Unlike Nginx or Varnish, HAProxy doesn’t support cacheing or serving static files. For us, that isn’t an issue – at least not yet. If it becomes important to cache or serve static files outside of Node.js, I could always add Varnish past HAProxy.
On the plus side, Stunnel is super easy to configure and HAProxy has a very clear configuration file as well. The ability to route to the proper Node.js instance when running multiple applications on the same machine required a little bit of config gymnastics with HAProxy, but the resultant configuration file is still very readable.
Let’s take a look at the configuration files.
Stunnel has emerged as a high quality and focused SSL termination server. Below is the configuration for stunnel that we have in use now.
1 2 3 4 5 6
pid = /var/run/stunnel.pid cert = /etc/ssl/cert.key_pem fips = no [https] accept = 443 connect = 80
The pid file reference on line 1 is important for configuring the monit service to ensure that Stunnel is always running.
Stunnel requires that you concatenate key and pem files of your SSL cert into a single file. Following the instructions in the original article, this is accomplished like so:
cat cert.key cert.pem > /etc/ssl/cert.key_pem
FIPS validation was more than I cared to get into for this exercise. I just turned it off on line 3 of the configuration file.
Lines 4 – 6 define the service and basically say that Stunnel will accept secure connections on port 443 and will ferry the decrypted traffic to the local machine on port 80.
Pretty simple, right?
HAProxy is a fast server for high availability and load balancing. It, too has a very clean configuration, although as we will see you have to go through some hoops to support routing traffic to multiple applications.
For our configuration, we wanted to achieve 3 primary goals:
The configuration we ended up with has the added bonus of allowing us to scale to more Node.js instances either on the same machine or other remote machines.
Here is the configuration file for HAProxy that we have in use now.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
global maxconn 4096 # Total Max Connections. This is dependent on ulimit daemon nbproc 2 defaults mode http log 127.0.0.1 local1 option httplog frontend all 0.0.0.0:80 timeout client 86400000 acl local src 127.0.0.1 acl is_websocket hdr(Upgrade) -i WebSocket acl is_websocket hdr_beg(Host) -i ws # identify apps by hostname acl is_coltrane hdr_dom(host) -i dev-coltrane01 acl is_maps hdr_dom(host) -i dev-maps01 # only local traffic (from stunnel) should be coming in. # if it's not of local origin, then redirect redirect prefix https://dev-coltrane01 if !local is_coltrane redirect prefix https://dev-maps01 if !local is_maps # hit websocket backends per app as needed use_backend maps_websocket_backend if is_maps is_websocket # hit http backends per app as needed use_backend maps_http_backend if is_maps use_backend coltrane_http_backend if is_coltrane backend maps_websocket_backend balance source option forwardfor # This sets X-Forwarded-For option httpclose timeout queue 86400000 timeout server 86400000 timeout connect 86400000 server maps_websocket_server localhost:3000 weight 1 maxconn 1024 check inter 10000 backend maps_http_backend balance source option forwardfor # This sets X-Forwarded-For option httpclose option httpchk timeout queue 100000 timeout server 100000 timeout connect 100000 server maps_http_server localhost:3000 weight 1 maxconn 1024 check inter 10000 backend coltrane_http_backend balance source option forwardfor # This sets X-Forwarded-For option httpclose option httpchk timeout queue 100000 timeout server 100000 timeout connect 10000 server coltrane_http_server localhost:8000 weight 1 maxconn 1024 check inter 10000
Let’s break this down section by section and line by line.
Line 2 specifies that the server will handle up to 4096 simultaneous connections.
Line 3 specifies that the server will put itself in the background when launched.
Line 4 specifies the the number of processes when launched will be 2.
Line 7 indicates that our default mode will be http (as opposed to tcp)
Line 8 tells HAProxy that all logs should go to the syslog facility on the local machine using the log designation local1. HAProxy logs all messages exclusively through the syslog facility.
Line 9 tells HAProxy to use the built in httplog format. HAProxy supports a very flexible log format language, but also has a number of built in definitions. A typical http log style is fine for our purposes.
The next sections define a single frontend and a number of backends. Incoming traffic will be processed through the frontend configuration and will passed on to one of the backends depending on the matching logic found in the frontend.
Line 11 tells HAProxy to bind to all network adapters and listen on port 80, the standard HTTP port.
Line 12 sets a very high client timeout. WebSocket connections can be very long running. We don’t want our HTTP connections to be too long running. A smaller timeout is set in the backend definition for HTTP traffic.
In the HAProxy configuration language, you define a number of acl’s (access control lists) that match against various aspects of the traffic. This includes headers, source and destination and many other aspects. You can see a complete list of what you can match acl’s against here. After you’ve defined acl’s, you can then route traffic to backends using logic constructs against those acl’s.
Line 14 defines our first acl. It will be true if the src of the traffic is from the localhost.
Lines 16 and 17 define acl’s to determine if the incoming traffic is WebSocket traffic. Notice that the two lines have the same acl name. Both matches are required for the acl to be true. A typical WebSocket transmission might start like this:
HTTP/1.1 101 Web Socket Protocol Handshake Upgrade: WebSocket Connection: Upgrade Host: ws://wsock.example.com/bin/demo ...
The acl defined on lines 16 and 17 looks for both the Upgrade header having the value of WebSocket and the Host header beginning with the protocol identifier ws.
Line 20 defines an acl to match traffic bound for the coltrane project. For our development environment, the user will have typed in a url like:
The host header will contain where the user intended to browse to.
Line 20 has the final acl of our configuration and matches traffic bound for the maps project.
The next group of lines in the frontend section makes use of the acl’s we defined to route traffic properly.
Lines 25 and 26 ensure that all traffic is secure. HAProxy should only ever route traffic that originates from the local machine, since that traffic will be coming from Stunnel. If anyone tries to browse directly to an application as in:
Line 26 ensures that the request is redirected to an https connection. The prefix modifier of the redirect directive tells HAProxy to replace the protocol and host part (the prefix) of the url, but to keep the rest of the url intact. The end of the line asserts the rules by which the redirect will happen. HAProxy has built in “and” logic so
redirect prefix https://dev-maps01 if !local is_maps
can be read as “if the traffic is not local and is destined for the maps application, then redirect with the prefix https://dev-maps01 and keep the rest of the url intact”.
Unlike Varnish, HAProxy’s processing language is somewhat crude. It is for this reason that we need to have the per-application directives on lines 25 and 26. While this is a minor annoyance, the configuration is still readable and it would not be hard to add an additional application.
Assuming the “if” statements do not have a “true” result on lines 25 or 26, the remaining lines will determine which backend the traffic will be routed to.
Line 29 determines if the traffic should go to a WebSocket backend based on the acl definitions. Right now, the maps project is the only one that has WebSocket traffic. Since the WebSocket acl matches any WebSocket traffic, it would be easy to add an additional application that had WebSocket traffic and direct it to the right backend.
If the traffic is not a WebSocket connection, line 32 and 33 ensure that the traffic is routed to the appropriate HTTP backend.
The WebSocket backend is similar to the HTTP backends. A key difference is the timeout values. Since we expect that WebSocket connections can be long lived, we give a much larger timeout on lines 39 – 41 then we do for the HTTP connections, as found on lines 49 – 51 and lines 59 – 61. NOTE: Each of these timeout parameters should be tuned to the type of connection. They should probably have different values that make sense for the type of timeout they are managing.
For each of the backend definitions, we chose the “balance source” directive as seen on lines 36, 45, 55. This tells HAProxy to load balance across the specified servers, but to keep established connections pinned to whatever server was initially chosen.
The “server” definitions describe the destination that HAProxy will route the traffic to. To start with, we only have once instance of a server for each project running on the local machine, but we can easily expand to multiple servers running on different machines. This gives us a very nice scale path. Traffic destined for the maps project (HTTP or WebSocket) will be sent to localhost:3000 (lines 42 and 52). Traffic destined for the coltrane project will be sent to localhost:8000 (line 62).
The last bits of the backend definitions concern health checks and source forwards. The “forwardfor” and “httpclose” options (lines 37-38, 46-47 and 56-57) ensure that the origin ip address of the traffic is inserted back into the headers being passed on to the backend servers. This ensures that the Node.js logs have the right information for the source of the traffic. Note that you will have to do some additional configuration on your destination backend server (Node.js in our case) to make sure that the X-Forwarded-For header information is included in the log. The HTTP backend definitions also have a health check option: “httpchk”. This causes HAProxy to connect to the backend server at regular intervals to ensure that it is up and receiving connections. The “check inter” clause of the “server” directive (lines 42, 52 and 62) tells HAProxy how often to check in milliseconds. If it cannot connect, HAProxy will automatically take that server definition out of rotation. This is a powerful feature for load balancing.
For more information on the HAProxy configuration, go here.
Since a primary motivation of this change in our architecture was ensuring that WebSocket traffic would work properly, we did some testing.
When running Node.js with socket.io locally, we typically see a WebSocket handshake via the console that looks like this:
debug - client authorized info - handshake authorized 4838183271912048813 debug - setting request GET /socket.io/1/websocket/4838183271912048813 debug - set heartbeat interval for client 4838183271912048813 debug - client authorized for debug - websocket writing 1:: debug - emitting heartbeat for client 4838183271912048813 debug - websocket writing 2:: debug - set heartbeat timeout for client 4838183271912048813 debug - got heartbeat packet debug - cleared heartbeat timeout for client 4838183271912048813 debug - set heartbeat interval for client 4838183271912048813
When I first tried to deploy our maps project to our normal production environment (the one using the Nginx architecture), I was seeing debug output like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
debug - client authorized info - handshake authorized 18330621741893556948 debug - setting request GET /socket.io/1/websocket/18330621741893556948 debug - set heartbeat interval for client 18330621741893556948 warn - websocket connection invalid info - transport end (undefined) debug - set close timeout for client 18330621741893556948 debug - cleared close timeout for client 18330621741893556948 debug - cleared heartbeat interval for client 18330621741893556948 debug - setting request GET /socket.io/1/xhr-polling/18330621741893556948?t=1344283228992 debug - setting poll timeout debug - client authorized for debug - clearing poll timeout debug - xhr-polling writing 1:: debug - set close timeout for client 18330621741893556948 debug - setting request GET /socket.io/1/xhr-polling/18330621741893556948?t=1344283229011 debug - setting poll timeout debug - discarding transport debug - cleared close timeout for client 18330621741893556948
You can see on Line 5, that we run into trouble right away. Line 10 shows that socket.io is trying alternate polling methods for WebSocket connections.
Performing the same testing with the Stunnel/HAProxy architecture, I see the output as in the first sample above.
Using Stunnel for SSL termination and HAProxy for load balancing and routing we have achieved a highly scalable environment that supports multiple applications and multiple protocols (HTTP and WebSocket). It provides a secure browser connection for our production environments where we have sensitive information. The configuration is exceedingly simple and easily extended to new applications.
I would be interested in any feedback on the downsides of this architecture, but so far it has been working like a champ for us.