Sunday, February 17, 2013

How I collect network statistics during benchmarks.


How I collect network statistics during benchmarks.
by the hardworking DBA…
as of 2/17/2013

I am benchmarking an Oracle stored procedure these days.  We have a 1 second SLA on this procedure.   It returns a rather large payload, sometimes exceeding 10,000 records and 5MB of data.   I have challenged the justification for that, and I am reassured that is what the application requires.  

I was especially curious how many network packets it takes to move that much data.   The ping command told me the average network round-trip speed between the server host and the client host is 0.14s.    That might be OK for a small payload, but if it takes 250,000 packets per call, at 0.14ms per packet, that would threaten my SLA.

If I wanted to check the impact of this stored procedure on the network, how could I do that?

Linux has a command, “netstat –s”, that produces the report shown in Listing 1 below.    It reports a lot of statistics about network usage.   I could run that before and after, and compare the results, right? 

The challenge is that the statistics are relative to when the machine last started.   I only care about statistics during a short period of time.   I cannot easily compare two 10-digit numbers that show total packets received, and tell if the difference is significant.    And a lot of the statistics reported don’t change at all.   I only care about the measures that changed.  

The solution is to
            1) take a “netstat –s” snapshot before the benchmark.
            2) run the benchmark.
            3) take a second “netstat –s” snapshot after the benchmark, and
            4) report any differences in a meaningful, human-readable way.

The result is a report such as the one shown in Listing 2.   While I was developing the script,  I followed the netstat-differential report with some other statistics from sar and a list of the machines connected to my node.   Those things have helped me talk to few other issues.  It doesn’t hurt to know such things.  

The script that produces Listing 2 is shown in Listing 3.

Check it out.  Maybe that will give you an idea?

Best wishes to always have good network throughput!
~ the hard-working DBA



Listing 1 – Typical output from the netstat –s command.

> netstat -s
Ip:
    1346387939 total packets received
    2943839 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    1282689297 incoming packets delivered
    3791124361 requests sent out
    19013 outgoing packets dropped
    6 fragments dropped after timeout
    128 reassemblies required
    26 packets reassembled ok
    6 packet reassembles failed
    136 fragments received ok
    19012 fragments failed
    422 fragments created
Icmp:
    74159 ICMP messages received
    255 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 24427
        timeout in transit: 7
        echo requests: 3238
        echo replies: 46462
        timestamp request: 17
        address mask request: 8
    93967 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 25376
        echo request: 65521
        echo replies: 3053
        timestamp replies: 17
IcmpMsg:
        InType0: 46462
        InType3: 24427
        InType8: 3238
        InType11: 7
        InType13: 17
        InType17: 8
        OutType0: 3053
        OutType3: 25376
        OutType8: 65521
        OutType14: 17
Tcp:
    18330381 active connections openings
    10578845 passive connection openings
    31217 failed connection attempts
    804128 connection resets received
    150 connections established
    1104945674 segments received
    3666265534 segments send out
    2514600 segments retransmited
    31 bad segments received.
    242582 resets sent
Udp:
    122170701 packets received
    16624 packets to unknown port received.
    0 packet receive errors
    122252674 packets sent
TcpExt:
    89760 invalid SYN cookies received
    2411 resets received for embryonic SYN_RECV sockets
    65 packets pruned from receive queue because of socket buffer overrun
    211 ICMP packets dropped because they were out-of-window
    14535660 TCP sockets finished time wait in fast timer
    181501 time wait sockets recycled by time stamp
    1370 packets rejects in established connections because of timestamp
    126238132 delayed acks sent
    43908 delayed acks further delayed because of locked socket
    Quick ack mode was activated 1727250 times
    372 times the listen queue of a socket overflowed
    372 SYNs to LISTEN sockets ignored
    521838861 packets directly queued to recvmsg prequeue.
    3963876152 packets directly received from backlog
    3457491267 packets directly received from prequeue
    293190 packets dropped from prequeue
    3307290762 packets header predicted
    3884415728 packets header predicted and directly queued to user
    886041005 acknowledgments not containing data received
    1618307249 predicted acknowledgments
    1 times recovered from packet loss due to fast retransmit
    549898 times recovered from packet loss due to SACK data
    3 bad SACKs received
    Detected reordering 163 times using FACK
    Detected reordering 3012 times using SACK
    Detected reordering 613 times using time stamp
    616 congestion windows fully recovered
    7910 congestion windows partially recovered using Hoe heuristic
    TCPDSACKUndo: 459
    10024 congestion windows recovered after partial ack
    2549350 TCP data loss events
    TCPLostRetransmit: 28
    1 timeouts after reno fast retransmit
    5440 timeouts after SACK recovery
    1123 timeouts in loss state
    2193866 fast retransmits
    93844 forward retransmits
    99062 retransmits in slow start
    71073 other TCP timeouts
    TCPRenoRecoveryFail: 1
    12967 sack retransmits failed
    265222 times receiver scheduled too late for direct processing
    17038 packets collapsed in receive queue due to low socket buffer
    1721513 DSACKs sent for old packets
    49 DSACKs sent for out of order packets
    10316 DSACKs received
    550 DSACKs for out of order packets received
    29 connections reset due to unexpected SYN
    770680 connections reset due to unexpected data
    53290 connections reset due to early user close
    4355 connections aborted due to timeout
IpExt:
    InMcastPkts: 862471
    InBcastPkts: 54622317

Listing 2 – An example of the output from the script.
================================================================================
Netstat Differential Report.

Ip:
                                      total packets received: 8210
                                      with invalid addresses: 1
                                  incoming packets delivered: 8173
                                           requests sent out: 8670
Icmp:
Icmp_input:
Icmp_output:
IcmpMsg:
Tcp:
                                 active connections openings: 68
                                 passive connection openings: 31
                                  connection resets received: 2
                                           segments received: 8012
                                           segments send out: 8522
Udp:
                                            packets received: 148
                                                packets sent: 148
TcpExt:
                TCP sockets finished time wait in fast timer: 53
                                           delayed acks sent: 211
                packets directly queued to recvmsg prequeue.: 4746
                     packets directly received from prequeue: 2240426
                                    packets header predicted: 1601
        packets header predicted and directly queued to user: 4731
                acknowledgments not containing data received: 622
                                   predicted acknowledgments: 5877
                    connections reset due to unexpected data: 2
IpExt:
                                                 InBcastPkts: 13

End of Netstat Differential Report.
================================================================================
Recent Network stats reported by SAR.
12:00:01 AM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s   rxcmp/s   txcmp/s  rxmcst/s
07:50:01 PM        lo     15.97     15.97  49533.85  49533.85      0.00      0.00      0.00
07:50:01 PM      eth0    139.20    179.11  90875.86 131371.47      0.00      0.00      0.44
07:50:01 PM      eth1     34.23     35.50   6752.12  18519.29      0.00      0.00      0.44
07:50:01 PM      eth2      1.75      0.01    159.03      0.65      0.00      0.00      0.44
07:50:01 PM      eth3      1.99      0.00    183.95      0.11      0.00      0.00      0.44
07:50:01 PM      sit0      0.00      0.00      0.00      0.00      0.00      0.00      0.00
================================================================================
Connections are seen to the following hosts.
 g3db2v.sys.cigna.com.
 gvoz0858.internal.cigna.com.
 localhost.
 lsh1010a.sys.cigna.com.
 lshnap001a.internal.cigna.com.
 lshslx001a.sys.cigna.com.
 lshslx002a.sys.cigna.com.
 lshslx003a.sys.cigna.com.
 lshslx014a.sys.cigna.com.
 lshslx015a.sys.cigna.com.
 psystcp1.sys.cigna.com.

Listing 3 – Korn-shell script that produces the netstat differential report shown above.

> cat netstat_differential_report.ksh

#!/bin/ksh
# This script simplifies taking a netstat snapshot before a benchmark, taking
# another netstat snapshot after a benchmark, and then reporting a
# comparison of just the network statistics that changed during the benchmark.
#
# See the show_usage function for more explanation.
#
# History
# Feb 2013 SWC  Wrote it.
#==============================================================================

export NET_FILE_1
export NET_FILE_2
export NET_STRING_FILE
#==============================================================================
show_usage ()
{
cat     <<-EOF
Usage:
        $0 -n SNAPSHOT_NUMBER [-h|-?]
where
        -n enables specification of a snapshot number.

        When you say "-n 1", the script will run netstat -s and save the
                output for future reference.
        When you say "-n 2", the script will run netstat -s, and display a
                report of the differences in network statistics since
                the first snapshot was saved.

The temporary files used by this script are saved in a subdirectory called logs.

Best wishes.
EOF
exit 1
}
#==============================================================================
# Parse the command line parameters.
if [ $# -eq 0 ]; then
        show_usage
fi

while getopts n: option
do
        case $option in
                n) SNAPSHOT_NUM=$OPTARG
                        ;;
                h) show_usage
                        ;;
                \?) show_usage
                        ;;
        esac
done

#==============================================================================

if [ "$SNAPSHOT_NUM." = "." ]; then
        echo "Error: Please specify which snapshot_number to process."
        show_usage
        exit 1
fi

DIR=`dirname $0`
cd $DIR
DIR=`pwd`
BASE=`basename $0 .ksh`

if [ ! -d $DIR/logs ]; then
        mkdir $DIR/logs
fi

#==============================================================================
# Use this version when the netstat statistic is at the start of the line.
report_netstat_diff_v1 ()
{
S=$1

        grep "$S$" $NET_FILE_1 | awk '{print $1}' | read V1
        grep "$S$" $NET_FILE_2 | awk '{print $1}' | read V2
        let V=V2-V1
        if [ $V -ne 0 ]; then
                printf "%60.60s: %d\n" "$S" $V

                echo    $S      >> $NET_STRING_FILE
        fi
}
#==============================================================================
# Use this version when the netstat statistic follows a colon.
report_netstat_diff_v2 ()
{
S=$1

        grep "$S" $NET_FILE_1 | cut -f2 -d':' | read V1
        grep "$S" $NET_FILE_2 | cut -f2 -d':' | read V2
        let V=V2-V1
        if [ $V -ne 0 ]; then
                printf "%60.60s: %d\n" "$S" $V

                echo    $S      >> $NET_STRING_FILE
        fi
}
#==============================================================================
# Use this version when the netstat statistic is in the middle of the line.
report_netstat_diff_v3 ()
{
S1=$1
FIELD_NUM=$2
S2=$3

        grep "$S1" $NET_FILE_1 | grep "$S2" | cut -f$FIELD_NUM -d' ' | read V1
        grep "$S1" $NET_FILE_2 | grep "$S2" | cut -f$FIELD_NUM -d' ' | read V2
        let V=V2-V1
        if [ $V -ne 0 ]; then
                printf "%60.60s _ %s: %d\n" "$S1" "$S2" $V

                echo    $S      >> $NET_STRING_FILE
        fi
}

#==============================================================================
if [ $SNAPSHOT_NUM -eq 1 ]; then
        rm -f $DIR/logs/netstat.*

        NET_FILE=$DIR/logs/netstat.1.log

        netstat -s > $NET_FILE

        if [ ! -f $NET_FILE ]; then
                print "Error: Failed to save snapshot 1."
                exit 1
        fi

        print "`date`: Saved netstat snapshot number 1."

elif [ $SNAPSHOT_NUM -eq 2 ]; then
        NET_FILE=$DIR/logs/netstat.2.log
        netstat -s > $NET_FILE

        print "`date`: Saved netstat snapshot number 2."

        # Separate the file contents by section
        for N in 1 2; do
                cat $DIR/logs/netstat.$N.log |
                while read LINE; do
                        if [ "$LINE" = "Ip:" ]; then
                                OUT_FILE=$DIR/logs/netstat.Ip.$N
                        elif [ "$LINE" = "Icmp:" ]; then
                                OUT_FILE=$DIR/logs/netstat.Icmp.$N
                        elif [ "$LINE" = "ICMP input histogram:" ]; then
                                OUT_FILE=$DIR/logs/netstat.Icmp_input.$N
                        elif [ "$LINE" = "ICMP output histogram:" ]; then
                                OUT_FILE=$DIR/logs/netstat.Icmp_output.$N
                        elif [ `echo "$LINE" | grep -c "ICMP messages sent"` -gt 0 ]; then
                                # Continue putting lines back in the regular Icmp file
                                OUT_FILE=$DIR/logs/netstat.Icmp.$N
                        elif [ "$LINE" = "IcmpMsg:" ]; then
                                OUT_FILE=$DIR/logs/netstat.IcmpMsg.$N
                        elif [ "$LINE" = "Tcp:" ]; then
                                OUT_FILE=$DIR/logs/netstat.Tcp.$N
                        elif [ "$LINE" = "Udp:" ]; then
                                OUT_FILE=$DIR/logs/netstat.Udp.$N
                        elif [ "$LINE" = "TcpExt:" ]; then
                                OUT_FILE=$DIR/logs/netstat.TcpExt.$N
                        elif [ "$LINE" = "IpExt:" ]; then
                                OUT_FILE=$DIR/logs/netstat.IpExt.$N
                        fi

                        echo $LINE >> $OUT_FILE
                done
        done

        printf "%80s\n" " " | tr ' ' '='
        print  "Netstat Differential Report."
        print

        # Now compare the different sections one at a time.
        for TYPE in Ip Icmp Icmp_input Icmp_output IcmpMsg Tcp Udp TcpExt IpExt; do
                NET_FILE_1=$DIR/logs/netstat.$TYPE.1
                NET_FILE_2=$DIR/logs/netstat.$TYPE.2
                NET_STRING_FILE=$DIR/logs/netstat.strings.txt
                rm -f $NET_STRING_FILE
                touch $NET_STRING_FILE

                print "$TYPE:"
                case $TYPE in
                        Ip)
                                report_netstat_diff_v1 'total packets received'
                                report_netstat_diff_v1 'with invalid addresses'
                                report_netstat_diff_v1 'forwarded'
                                report_netstat_diff_v1 'incoming packets discarded'
                                report_netstat_diff_v1 'incoming packets delivered'
                                report_netstat_diff_v1 'requests sent out'
                                report_netstat_diff_v1 'outgoing packets dropped'
                                report_netstat_diff_v1 'fragments dropped after timeout'
                                report_netstat_diff_v1 'reassemblies required'
                                report_netstat_diff_v1 'packets reassembled ok'
                                report_netstat_diff_v1 'packet reassembles failed'
                                report_netstat_diff_v1 'packet reassembles failed'
                                report_netstat_diff_v1 'fragments received ok'
                                report_netstat_diff_v1 'fragments failed'
                                report_netstat_diff_v1 'fragments created'
                                ;;
                        Icmp)
                                report_netstat_diff_v1 'ICMP messages received'
                                report_netstat_diff_v1 'input ICMP message failed.'
                                report_netstat_diff_v1 'ICMP messages sent'
                                report_netstat_diff_v1 'ICMP messages failed'
                                ;;
                        Icmp_input)
                                report_netstat_diff_v2 'destination unreachable'
                                report_netstat_diff_v2 'timeout in transit'
                                report_netstat_diff_v2 'echo requests'
                                report_netstat_diff_v2 'echo replies'
                                report_netstat_diff_v2 'timestamp request'
                                report_netstat_diff_v2 'address mask request'
                                ;;
                        Icmp_output)
                                report_netstat_diff_v2 'destination unreachable'
                                report_netstat_diff_v2 'echo request'
                                report_netstat_diff_v2 'echo replies'
                                report_netstat_diff_v2 'timestamp replies'
                                ;;
                        IcmpMsg)
                                report_netstat_diff_v2 'InType0'
                                report_netstat_diff_v2 'InType3'
                                report_netstat_diff_v2 'InType8'
                                report_netstat_diff_v2 'InType11'
                                report_netstat_diff_v2 'InType13'
                                report_netstat_diff_v2 'InType17'
                                report_netstat_diff_v2 'OutType3'
                                report_netstat_diff_v2 'OutType8'
                                report_netstat_diff_v2 'OutType14'
                                ;;
                        Tcp)
                                report_netstat_diff_v1 'active connections openings'
                                report_netstat_diff_v1 'passive connection openings'
                                report_netstat_diff_v1 'failed connection attempts'
                                report_netstat_diff_v1 'connection resets received'
                                report_netstat_diff_v1 'connections established'
                                report_netstat_diff_v1 'segments received'
                                report_netstat_diff_v1 'segments send out'
                                report_netstat_diff_v1 'segments retransmited'
                                report_netstat_diff_v1 'bad segments received.'
                                report_netstat_diff_v1 'resets sent'
                                ;;
                        Udp)
                                report_netstat_diff_v1 'packets received'
                                report_netstat_diff_v1 'packets sent'
                                report_netstat_diff_v1 'packet receive errors'
                                report_netstat_diff_v1 'packets to unknown port received.'
                                ;;
                        TcpExt)
                                report_netstat_diff_v1 'invalid SYN cookies received'
                                report_netstat_diff_v1 'resets received for embryonic SYN_RECV sockets'
                                report_netstat_diff_v1 'packets pruned from receive queue because of socket buffer overrun'
                                report_netstat_diff_v1 'ICMP packets dropped because they were out-of-window'
                                report_netstat_diff_v1 'TCP sockets finished time wait in fast timer'
                                report_netstat_diff_v1 'time wait sockets recycled by time stamp'
                                report_netstat_diff_v1 'packets rejects in established connections because of timestamp'
                                report_netstat_diff_v1 'delayed acks sent'
                                report_netstat_diff_v1 'delayed acks further delayed because of locked socket'
                                report_netstat_diff_v3 'Quick ack mode was activated'    6 'times'
                                report_netstat_diff_v1 'times the listen queue of a socket overflowed'
                                report_netstat_diff_v1 'SYNs to LISTEN sockets ignored'
                                report_netstat_diff_v1 'packets directly queued to recvmsg prequeue.'
                                report_netstat_diff_v1 'packets directly received from backlog'
                                report_netstat_diff_v1 'packets directly received from prequeue'
                                report_netstat_diff_v1 'packets dropped from prequeue'
                                report_netstat_diff_v1 'packets header predicted'
                                report_netstat_diff_v1 'packets header predicted and directly queued to user'
                                report_netstat_diff_v1 'acknowledgments not containing data received'
                                report_netstat_diff_v1 'predicted acknowledgments'
                                report_netstat_diff_v1 'times recovered from packet loss due to fast retransmit'
                                report_netstat_diff_v1 'times recovered from packet loss due to SACK data'
                                report_netstat_diff_v1 'bad SACKs received'
                                report_netstat_diff_v3 'Detected reordering' 3 'times using FACK'
                                report_netstat_diff_v3 'Detected reordering' 3 'times using SACK'
                                report_netstat_diff_v3 'Detected reordering' 3 'times using time stamp'
                                report_netstat_diff_v1 'congestion windows fully recovered'
                                report_netstat_diff_v1 'congestion windows partially recovered using Hoe heuristic'
                                report_netstat_diff_v2 'TCPDSACKUndo'
                                report_netstat_diff_v1 'congestion windows recovered after partial ack'
                                report_netstat_diff_v1 'TCP data loss events'
                                report_netstat_diff_v2 'TCPLostRetransmit'
                                report_netstat_diff_v1 'timeouts after reno fast retransmit'
                                report_netstat_diff_v1 'timeouts after SACK recovery'
                                report_netstat_diff_v1 'timeouts in loss state'
                                report_netstat_diff_v1 'fast retransmits'
                                report_netstat_diff_v1 'forward retransmits'
                                report_netstat_diff_v1 'retransmits in slow start'
                                report_netstat_diff_v1 'other TCP timeouts'
                                report_netstat_diff_v2 'TCPRenoRecoveryFail'
                                report_netstat_diff_v1 'sack retransmits failed'
                                report_netstat_diff_v1 'times receiver scheduled too late for direct processing'
                                report_netstat_diff_v1 'packets collapsed in receive queue due to low socket buffer'
                                report_netstat_diff_v1 'DSACKs sent for old packets'
                                report_netstat_diff_v1 'connections reset due to unexpected SYN'
                                report_netstat_diff_v1 'connections reset due to unexpected data'
                                report_netstat_diff_v1 'connections reset due to early user close'
                                report_netstat_diff_v1 'connections aborted due to timeout'
                                ;;
                        IpExt)
                                report_netstat_diff_v2 'InMcastPkts'
                                report_netstat_diff_v2 'InBcastPkts'
                                ;;
                esac

                # Show any strings that might have been missed in the case-statement above.
                diff $NET_FILE_1 $NET_FILE_2            |
                        grep -v '\-\-\-'                |
                        grep -v '[0-9],[0-9]'           |
                        grep -v '[0-9]c[0-9]'           |
                        grep -v -f $NET_STRING_FILE

        done
        print
        print  "End of Netstat Differential Report."
        printf "%80s\n" " " | tr ' ' '='

        print  "Recent Network stats reported by SAR."
        sar -n DEV | grep IFACE | head -1
        sar -n DEV | grep -v Average | tail -6

        printf "%80s\n" " " | tr ' ' '='
        print  "Connections are seen to the following hosts."
        IP_FILE=$DIR/logs/$$.ip
        netstat -nat | awk '{ print $5}' | cut -d: -f1 | sed -e '/^$/d' | sort -u > $IP_FILE

        NS_FILE=$DIR/logs/$$.ns
        grep '\.' $IP_FILE |
        while read IP; do
                nslookup $IP 2>&1 | grep name | cut -f2 -d'=' >> $NS_FILE
        done
        sort -u $NS_FILE

        # Cleanup
        rm $NS_FILE
        rm $IP_FILE

        rm $DIR/logs/netstat.*
fi

…End of another blog by the hardworking DBA!