1. Flow Control dilemma and
NetApp’s changing recommendations
“Turn the flow-control valve anti-clock wise in order to disable it on the NetApp storage”
Just joking…
Please note: views, thoughts, and opinions expressed in this article belong solely to the author, and
not necessarily to the author's employer, organization, committee or other group or individual. If you
spot any incorrect information in this article, please feel free to correct me.
2. Time line of changing recommendations
JAN 2013 | TR-3802 - Disable flow-control: Between SWITCH & STORAGE
MAR 2015 | TR-4392 - Disable flow-control: END-2-END: Server/ESX <> SWITCH <>
STORAGE
JAN 2016 | TR-4182 - Disable flow-control: on Cluster Ports only, rest you (fu* Off)decide!
3. This article recommends ‘dilema’ # 1
Philosophy of dilema # 1 = Let the flow control be managed higher up the stack in the form of
congestion control. This can be done by applications much better as hardware based flow control is not
application aware.
Application level: A TCP connection uses the end-to-end connection to determine the window size
used, which can take into account the bandwidth, buffer space, and round trip time and can deal with it
much efficiently.
Hardware level : The switch port or NIC decides when to send a PAUSE frame and for what duration
while only taking into account the link between SWITCH & STORAGE, unfortunately 'No upper level
protocols are considered'.
For these reasons, it's recommended to disable flow-control on SWITCH Ports and STORAGE NODE NICs.
In a simple [Similar Hardware] & smaller networks, flow-control method may work well. However, with
the introduction of larger and larger networks along with more advanced and faster network equipment
and software, technologies such as TCP windowing, increased switch buffering, works better.
Please note, the recommendations mentioned here are purely based on the theoretical assumption that
the – “Flow control are handled better higher up the stack”.
WISDOM: flow-control can only be disabled for dedicated 10G Ethernet NIC; flow-control is not
applicable to Converged Network Adapter (CNA/UTA) cards, where it cannot be disabled. If you disable
flow control on the switch port, flow-control is automatically disabled for devices such as CNAs. You may
not realize it but depending upon the SWITCH settings it may be already set to ‘none’ b’cos SWITCH Port
is set to ‘none’.
You may be looking at the VLAN or IFGRP Port which might show ‘Full’, and you might think that the
flow-control settings for 10G Port is ‘full’, but you can safely ignore it, b’cos you are looking at a wrong
place, as long as ‘flow-control’ for the Physical Ports shows ‘none’, you are done.
You can use the ‘switch’ –type physical with ‘port show’ command to look for only physical ports:
::> port show -fields type, flowcontrol-admin,flowcontrol-oper -speed-oper 10000, -type physical
4. What is Ethernet flow control?
Ethernet flow control is a layer 2 network mechanism that is used to manage the rate of data
transmission between two endpoints. It provides a mechanism for one network node to control the
transmission speed of another so that the receiving node is not overwhelmed with data.
MOST IMPORTANT INFO for NetApp:
You can modify the MTU, autonegotiation, duplex, flow control, speed, and health settings of a physical
network port only; you cannot modify any of these for VLAN or IFGRP. The only parameter that you
can modify for VLAN or IFGRP is the MTU size.
Difference between flow-control admin & operational value:
Flow-control-admin is the administrative value that you have control over it and it is configured on
the STORAGE NODE’s Physical Ports.
Flow-control-oper is the operational state of the flow-control as reported after negotiation with the
SWITCH PORT, for which you have no control over it.
Hence, if you disable flow-control on the Physical Ports on the storage side, but flow-control-oper still
says FULL, it means the Network Switch needs to be updated to have flow-control fully disabled, or set
to ‘none’.
Best practice recommended by NetApp:
What are the flow control best practices for 10g Ethernet?
https://kb.netapp.com/app/answers/answer_view/a_id/1002403/loc/en_US#__highlight
Historically (7-Mode/clustered Data ONTAP): NetApp had recommended that flow control be
disabled on all network ports [cluster & data] within a NetApp Data ONTAP cluster. This approach is no
longer the case. Guidance in this area has since changed; the new recommended best practice is as
follows:
Disable flow control on cluster network ports in the Data ONTAP cluster: Flow-control on cluster
ports are correctly set to 'none'.
Flow-control on the remaining network ports (the ports that provide data, management, and
Intercluster connectivity) should be configured to match the settings within the rest of your
environment.
i.e it should either be 'none/receive/send/full' end-2-end. However, NetApp SMEs still recommends
disabling flow-control for normal data ports as well citing performance improvements reported by
various clients.
5. Why should flow control be disabled in clustered Data ONTAP?
First: Buffer limitations on some switches.
Second: More data, better hardware.
Third: Congestion control.
For information on each bit, read the following post [Courtesy: Justin Parisi]
https://blogs.cisco.com/perspectives/to-flow-or-not-to-flow
The general idea is to let the flow control be managed higher up the stack in the form of congestion
control.
Maintenance window recommended:
Keep in mind that changing flow control on a port will result in a brief blip in connectivity, as the port will
reset to read the new configuration. Therefore it is best advised to change the flow control in a
maintenance window.
IMPORTANT: Flow-control should be disabled throughout the network, i.e from source to destination.
Otherwise, it will not bring any benefits and may even worsen the performance.
Attention: Please make sure switch ports are also set to disable/none for the flow-control, to match
with the flow-control settings on storage Ethernet ports.
In the following exercise we are disabling flow-control on the Physical 10G dedicated Data Port serving
CIFS
Steps to set the flow-control settings to ‘none’:
1. Run the following commands to identify the Physical Ports for which you want to disable the
flow-control:
2. We can start with identifying LIF, VLAN & ifgrp that the CIFS LIFs belongs to:
LIF:
::> network interface show -vserver svm
This command output will provide the VLAN name [Provided VLAN exists]
VLAN:
::> vlan show
This command will provide you the IFGRP name [Provided IFGRP exists]
IFGRP:
::> ifgrp show
Finally, this command will provide you the names of the Physical Ports that we intend to know.
6. From the command output of the above mentioned commands, we would have obtained the required
information necessary for changing the flow-control on ports used for CIFS:
LIF: cluster-01_cifs_1 [Used for serving CIFS connections] – This LIF is sitting on a IFGRP.
IFGRP: a0a : e0c & e0d [Physical Ports bonded together to form a VIF/IFGRP]
VLAN : a0a-189 [To which this IFGRP belongs]
Please note: It’s only the ‘Physical 10G Ports e0c & e0d’ that we are concerned with, b’cos as stated
earlier, you can only change the flow-control settings for Physical Ports.
Run the following command to note the current 'flow-control' settings:
::> port show -fields type, flowcontrol-admin,flowcontrol-oper -speed-oper 10000 –type physical
Steps to disable flow-control:
1. Migrate the LIF which is serving CIFS to Partner Node.
First make sure auto-revert is set to false, i.e until you finish the task.
::> net int modify -vserver svm –lif cluster-01_cifs_1 -auto-revert false
Migrate the LIF to Partner Node, before you go ahead.
::> net int migrate -vserver svm -lif cluster-01_cifs_1 -dest-node cluster-02 -dest-port xx
2. Remove: It is recommended to remove the physical ports one by one from the IFGRP and only
then set the flow-control to 'NONE' and then add it back to IFGRP.
IFGRP : a0a consists of Physical Ports: e0c & e0d.
a. Remove e0c from IFGRP a0a
::> ifgrp remove-port -node clust-01 -ifgrp a0a -port e0c
b. Set the flow control to 'none'
::> network port modify -node clust-01 -port e0c -flowcontrol-admin none
c. Add e0c back to IFGRP a0a
::> ifgrp add-port -node clust-01 -ifgrp a0a -port e0c
Repeat the exercise for e0d and then migrate the LIF back to clust-01, repeat the same for partner
Node.
3. Once it is done, verify to ensure that the flow-control is indeed indicating 'none' which means
disabled.
::> port show -fields type,flowcontrol-admin,flowcontrol-oper -speed-oper 10000, -type physical
ashwinwriter@gmail.com
July, 2018