Problem 1: UDP, VLANs and Lack of Flow Control


Problem

    VLAN devices do not support scatter-gather

    This means the that each skb needs to be linearised and thus cloned if they are trasmitted on a VLAN device

    Cloning results in the original fragments being released

    This breaks Xen's netfront/netback flow-control


Result

    A guess can flood dom0 with packets

    Very effective DoS attack on dom0 and other domUs


Work-Around

    Use the credit scheduler to limit the rate of a domU's virtual interface to something close to the rate of the physical interface:

        vif = [ "mac=00:16:36:6c:81:ae,bridge=eth4.100, script=vif-bridge,rate=950Mb/s" ]

    Still uses quite a lot of dom0 CPU if domU sends a lot of packets

    But the DoS is mitigated


Partial Solution

    scatter-gather enabled VLAN interfaces

    Problem is resolved for VLANS with supported physical devices

    Still a problem for any other device that doesn't support scatter-gather


Patches

    Included in v2.6.26-rc4

        "Propagate selected feature bits to VLAN devices" and;

        "Use bitmask of feature flags instead of seperate feature bit" by Patrick McHardy.

        "igb: allow vlan devices to use TSO and TCP CSUM offload" by Jeff Kirsher

    Patches for other drivers have also been merged


Problem 2: Bonding and Lack of Queues


Problem

    The default queue on bond devices is no queue

        This is because it is a software device, and generally queuing doesn't make sense on software devices

    qdiscs default the queue-length of their device

   

Result

    It was observed that netperf TCP STREAM only achieves 45-50Mbit/s when controlled by a class with a ceiling of 450Mbit/s

    A 10x degredation!


Solution 1a

    Set the queue length of the bonding device before adding qdiscs

    ip link set txqueuelen 1000 dev bond0

   

Solution 1b

    Set the queue length of the qdisc explicitly

    tc qdisc add dev bond0 parent 1:100 handle 1100: pfifo limit 1000


Problem 3: TSO and Lack of Accounting Accuracy


Problem


Problem

    If a packet is significantly larger than the MTU of the class, it is accounted as being approximately the size of the MTU.

    And the giants counter for the class is incremented

    In this case, the default MTU is 2047 bytes

    But TCP Segmentation Offload (TSO) packets can be much larger: 64kbytes

    By default Xen domUs will use TSO


Result

    The result similar to no bandwidth control of TCP


Workaround 1


Disable TSO in the guest, but the guest can re-enable it


    # ethtool -k eth0 | grep "tcp segmentation offload"

    tcp segmentation offload: on

    # ethtool -K eth0 tso off

    # ethtool -k eth0 | grep "tcp segmentation offload"

    tcp segmentation offload: off


Workaround 2


Set the MTU of classes to 40000

Large enough to give sufficient accuracy

Larger values will result in a loss of accuracy when accounting smaller packets

#tc class add dev peth2 parent 1:1 classid 1:101 rate 10Mbit ceil 950Mbit mtu 40000


Solution

    Account for large packets

    Instead of truncating the index, use rtab values multiple times

        rtab[255] * (index >> 8) + rtab[index & 0xFF]

    "Make HTB scheduler work with TSO" by Ranjit Manomohan was included in 2.6.23-rc1