Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
WhitePaper_IntroductionToClusters
1. Introduction to The Nutanix Clusters
By Asawari Khedkar, asawaridani@yahoo.com
1
The Nutanix cluster is a Virtual Computing Platform consisting of a distributed system which can run
multiple virtual machines (VMs). The Nutanix appliances include high performance server nodes, each one
integrating a core processor, local storage and flash. The storage sub‐system consists of a software based
control logic called the Nutanix Control Virtual Machine (CVM) or simply known as VM controller,
communicating to a hypervisor. The local storage; SATA and disk drives are attached via a direct data path
to the VM controller. The local flash; fusion‐IO, which is a PCIe attached flash card, has a VM Direct path
to the Nutanix VM controller.
This document provides a high level explanation of how data is written and read in a Nutanix data cluster.
This document also explains the process of data replication and recovery.
Anatomy of a Write Operation
A guest VM initiates a write request to the
Nutanix VM controller. The VM controller passes
the data directly to the local PCIe attached
Fusion‐IO flash drive. Data is stored in the local
flash drive to provide faster performance.
Consequently for majority of requests, the data
never traverses the network. Older, less
frequently accessed data also known as cold
data, is migrated to the more economical hard
drive. If the cold data is more frequently
requested by the VM, the data is brought back to
the local flash drive. This cold data is now called
as hot data. Once the write is completed, data is
replicated synchronously across all nodes by the
VM controller. Every data delivered to the local
storage is check summed to protect it against
disk faults.
Anatomy of a Read Operation
A guest VM initiates a read request to the
Nutanix VM Controller. Data is accessed directly
from the PCIe attached Fusion‐IO flash,
presenting a high speed direct data path.
Data Replication
Data is synchronously replicated on more than
one nodes of the cluster, when the VM controller
executes a write operation. This process takes
place in the background.
Data is never mirrored completely on only one
individual node in the cluster. It is divided into
chunks and stored on different nodes which
ensures that data exists in at least two
independent locations and is fault tolerant. For a
read request of data residing on a failed disk,
multiple nodes will be accessed by the VM
controller. This approach eliminates hot spots of
requests and accesses to one node. Each VM
Controller in the cluster keeps track of the address
of data stored on all the local hard disk drives. Data
replication on different nodes of the cluster has
multiple advantages. The main advantages are
high availability and data protection in case of
node downtime, node failure or disk failure.
Data Recovery
Data recovery is essential on a disk failure. On a
disk failure, the VM controller will reroute the
guest VM to another node. This routing is
transparent to the hypervisor and guest VM. When
a VM is migrated from a source node to a
destination node, the data becomes remote to the
VM. The newly migrated VM’s now local VM
controller knows the location of the remote data.
The controller fetches the data to its local fusion‐
IO flash storage. Data locality is the key to VM
performance.