Tuning Solaris TCP parameters

2007. 1. 26. 15:47IT/Network

Tuning Solaris TCP parameters
Filehandles are extremly important for a good operation of a heavily loaded internet server. However filehandles are indirectly linked to some TCP related parameters a TCP connection goes through different phases in it's lifetime. These phases have some timeouts set to them, these default tiemouts are set to generous for internet servers. Without adjusting these values you will still run out of filehandles on a relative heavily loaded internet server. The nddprogram on solaris changes the values of the different parameters. You need to add these entries in a startup script, since they are reset after a reboot.
The biggest problem for internet servers is that most sessions/connections have a very short lifetime. So loads of different connections are generated within a small period of time. When you have to wait on TIME_WAIT's forinstance then these filehandles are still in use untill the set time expires. In the meaintime that filehandle con not be used for other connections.

TIME_WAIT
The TIME_WAIT state is the state that TCP will be put in after a close of the socket has been issued and when it has passed the FIN_WAIT_1 and FIN_WAIT_2 state. There are two reasons of the TIME_WAIT state that Richard Stevens points out to a network programmer:

- To terminate a fullduplex TCP connection reliably
- To let old duplicate segments to expire throughout the network

This value can however become a problem when you have a lot of connection to a certain socket, that are closed rapidly. For instance a webserver or a email server. A file descriptor is being used while waiting in this state. With a heavily loaded server this could soon result in filehandle starvation. Adding more filehandles would only be a temporary solution. What you should do is reduce this wait state.

When netstat -an|grep TIME_WAIT shows a lot of TIME_WAIT connections, then you should reduce this value.

The TIME_WAIT state has been set to 2MSL (2*Maximum Segment Lifetime) in the TCP standard. Solaris however allows us to configure this value seprately from the MSL value. I initially set it to 60 seconds, which is in most cases sufficient, however with internet sites that how lots of hits 60 seconds might even be to long. My rule of thumb is that I want to have at least 30% filehandles available in normal operation. This allows room for sudden bursts of connections. There are also some software development tricks to avoid the TIME_WAIT state. For instance SO_LINGER and sending a RST to the socket. Personally I'm not to fond about this myself. But in same cases the cause justifies all means.

ndd -set /dev/tcp tcp_time_wait_interval 60000
ndd -get /dev/tcp tcp_time_wait_interval  

FIN_WAIT_2
FIN_WAIT_2 state should be set lower then the default value (675000). Otherwhise resource starvation could become critical. This happened with earlier versions of Apache incombination with certain browsers, which actively closed connections, leaving it to stand in the FIN_WAIT2 state when the client did not send his FIN reply. There is no time out value described for this in the TCP standards. Eventhough the FIN_WAIT2 state is a rare occurance it still should not be set to low! A half closed socket (which infact is FIN_WAIT2) is a described situation for a TCP socket and can occur in real life.

When netstat -an|grep FIN_WAIT2 still shows a lot of FIN_WAIT2 states, you might concider bringing down the value even more. 67500 is a BSD standard, which is more kind then the SUN standard value.

ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 5000

TCP maximum connection queue
Also known as SOMAXCONN. This value assembles the number of sockets that may be queued for servicing which are so called "half open" the SYN is recieved but the 3 way handshake is not yet completed. Instead of preventing the client to connect to the server because all sockets are used you can put it in a queue to be serviced when resources come available. Most Unixes have this value set to 128, which is to small for heavily vistited webservers. Usually you set this value to 256 for a moderate webserver, some very heavily vistited servers should set it to 512. This value just like the TCP send and recieve buffers is a maximum value. When the Application/server/service, does not use this values in their listen function call, then this value is useless. Luckily it does not consume alot of resources, it only houses socket structures. Also here goes that you might need to set this value in a seperate configuration for your server software. Sometimes this value is set hardcoded in the code so you might need to change it in the code and recompile. Solaris is able to configure the "half open" SOMAXCONBUF and the completly connected maximum buffersize unlike other Unixes which can only specify the so called "half open" connections buffer. Solaris calls the so called SOMAXCON parameter: tcp_conn_req_max_q0 it's default value is set to 1024, which is infact a bit over done, eventhough the amount of memory allocated is small since it only stores socket structures. Counting the number of tcp_conn_req_max_q0 needed for your situation is possible by simply running netstat -an|grep SYN_RCVD|wc -l this number shows number of sockets that are "half open" when this number is close to your maximum you should increase your maximum.

ndd -set /dev/tcp tcp_conn_req_max_q0 256
ndd -set /dev/tcp tcp_conn_req_max_q 128 (is the actual listen queue which are completly connected)

TCP transmit buffer size
Changing the transmit buffer sizes depends on what data you are sending. If you are sending small amounts of data such as a HTTP server and 96% of the times with SMTP, then a 16KB buffer is sufficient. FTP servers might gain a lot more performance by setting this value to 32KB or even 64KB. Whn the applications do not set the buffer to your selected maximum it is a waste of memory, programmers can select the recieve and send buffer with the setsockopt function call. Good programmers will query the buffer sizes with the getsockopt function and use the maximum buffer value available. Even better programmers, allow you to set these values your self via a configuration file. So cranking this value up when the application does'nt use these resource is just a waste of memory. Since every connection will allocate the set amount of buffer space (this is not dynamically unlike the TCP max buffer). When you serve alot of clients at the same time, then you run out of memory and the system will start to swap. Leaving you with a problem that you tried to avoid, which are unnecesarry system calls. So it is important to know what an average connection generates on outgoing traffic to chose a sensable value. Larger is not always better in this case.

Solaris 8 has set this value to 16KB already which is my opnion sufficient for Webservers, SMTP servers. For FTP servers you might want to crank it up to 32KB or even 64KB. Size larger then 64KB may not be supported by the clients, so resources would just be wasted. It also is a good idea to check out whether you can set the SO_SNDBUF size of a application from a configuration file, otherwise you should read the code of the application, to see what the size SO_SNDBUF size is that the application uses. This value can be increased in the source code and the application should be recompiled.

ndd -set /dev/tcp tcp_xmit_hiwat 16384 (is the default for Solaris 8)

TCP recieve buffer size
The recieve buffer size is the same as the TCP send buffer size mentioned above, except that it is used for buffering recieved data. The recieve buffer is not really interessting when it comes to webservers. Since the requests are strings which are not large. To save memory for filesystem cache you could think about tuning this value down to 4KB for instance. However when you have a mail server that recieves emails (which are on average 6-20KB or often larger) then a larger recieve buffer for instance 16KB can reduce the number of read system calls dramatically and improve overal performance. just like with the send buffer you need to know what the amount of data is that a average connects generates. Also here goes that large is not always better.

Solaris 8 has a default value of 24576 which I think is a strange value. For a system that is soley a webserver you can easily put it down to 4KB or even 1KB. This 24KB default value is a nice middle value for a SMTP server of a NFS client an even larger value here will increase the NFS write performance of the client. Assuming your average files are larger app. 24K. Incase of a NFS client I would suggest a 64KB buffer for read and write (also depending on the average file size).

ndd -set /dev/tcp tcp_recv_hiwat 16384

Ephemeral ports
Most Unixes set the number of ports available to connect to servers to approximently 4000. This value is way to small for busy servers these days. Assuming you have a TIME_WAIT value of 60 seconds, then it means you can handle at the most 4000 connections per minute. You think you could solve this by further decreasing the TIME_WAIT but the actual problem is the lack of ephemeral ports. My rule of thumb is to set it from port 49152 to port 65535, wich allows for 16383 connections. The default for Solaris 8 is even more generous it starts from port 32768 and ends on port 65535. Which is infact 50% of all available ports. I would suggest to tune the start value up to my 49152. But leaving it at is default isn't bad either.

ndd -set /dev/tcp tcp_smallest_anon_port 49152
ndd -set /dev/tcp tcp_largest_anon_port 65535


Client/Registered ports
These client ports should not be mistaken with the ephemeral ports. The client ports are the portnumbers that a client gets when connecting to a server. So when you telnet to a host, a client gets a client portnumber from 1024 and upwards with a usual maximum of 5000. Usually you don't have to tune these values since most Unixes allow 4000 client ports, which allows 4000 client connections to be made from that single host. Starting 4000 client programs will probably result in a memory problem or a maximum amount of processes that are allowed to be run then a client-port problem. Would you however feel the need of increasing the amount of available client ports then only increase the upward value never drop below the 1024, since these ports smaller then 1024 are so called: "Well Known ports" that only a root users can connect to.

Solaris doesn't allow to set the end value of the Registered/Client ports. It simply states that where the Ephemeral ports start the client ports end. You can however tune the starting value.

ndd -set tcp_smallest_nonpriv_port 1024 (Which is the Solaris default)

Forwarding source routed packets
This has nothing todo with performance but with security. Source routed packets allow you to set a routing path in a packet. This path should be taken, a malicous attacker could use this to avoid firewalls forinstance. Solaris allows source routed packets by default, in my opinion a major mishap! So please be sure to switch it of.

ndd -set /dev/tcp ip_forward_src_routed 0

'IT > Network' 카테고리의 다른 글

Solaris 시스템 성능 조정_TCP Timer  (0) 2007.01.26
Port Scanning  (0) 2006.12.28
TCP Connetion  (0) 2006.12.28
RAID Level  (0) 2006.12.28
전력선 통신 시대??  (1) 2006.12.28