Mais conteúdo relacionado
Semelhante a Accelerating Data Management - Dave Fellinger - RDAP12 (20)
Accelerating Data Management - Dave Fellinger - RDAP12
- 2. Data Management
Data management can consist of;
► Access management
• Client access and maintenance
• User permissions
• Policies including data security
• Policies including data continuance
► Data manipulation utilizing microservices
• Data checking
• Implementing processes such as reduction and filtering
• Data registration and metadata extraction
• Data migration
3 ©2012 DataDirect Networks. All Rights Reserved.
- 3. Sources of Service Latency
► Hardware Chain
• Disk drive servo operation
• Multiple SCSI layers
• Multiple bus transitions
• Memory bandwidth limitations
• Network service latencies
► Software Chain
• Memory copies
• Kernel operations
• Layers of consecutive operations including the service of V-nodes,
I-nodes and FAT
• Serial data transport processes
4 ©2012 DataDirect Networks. All Rights Reserved.
- 4. What is „Embedded Processing‟?
And why ?
► Do data intensive processing as „close‟ to the storage as possible.
• Bring computing to the data instead of bring data to computing
► HADOOP is an example of this approach.
► Why Embedded Processing?
► Moving data is a lot of work
► A lot of infrastructure needed
Client sends a request to storage (red ball)
Client
But what we really want is :
► So how do we do that? Storage
Storage responds with data (blue ball)
©2012 DataDirect Networks. All Rights Reserved.
- 5. Storage with Virtual Machines
8 x IB QDR/10GbE Host Ports (No Fibre Channel)
Interface Virtualization
Virtual Virtual Virtual Virtual
Machine Machine Machine Machine
System memory
RAID Processors High-Speed Cache Cache Link
Internal SAS Switching
©2012 DataDirect Networks. All Rights Reserved.
- 6. Repurposing Interface Processors
► In the block based SFA10K platform, the IF processors are
responsible for mapping Virtual Disks to LUNs on FC or IB
► In the SFA10KE platform the IF processors are running VMs
► The OS running on those VMs uses a driver to access the
RAID processors directly
► RAID processors place data (or use data) directly in the
VM‟s memory
► One hop from disk to VM‟s memory
► Now the storage is no longer a block device
► It is a storage appliance with processing capabilities
©2012 DataDirect Networks. All Rights Reserved.
- 7. Example configuration
► Now we can put iRODS inside the RAID controllers
► The iCAT processor has lots of memory and SSDs for DB storage
► Either use all VMs for iRODS or add a parallel filesystem such as
GPFS for fast scratch
► The filesystem uses SAS for frequent used files and SATA for the
rest
► The following example is a mix of iRODS with GPFS
• This give iRODS the fastest access to the storage because it doesn‟t have
to go onto the network to access a fileserver. It lives inside the fileserver.
• The same filesystem is also visible from an external compute cluster via
GPFS running on the remaining VMs
► This is only one controller, the 4 VMs on the other controller need
some work too
• They see the same storage and can access it at the same speed.
©2012 DataDirect Networks. All Rights Reserved.
- 8. Example configuration
8x 10GbE Host Ports
Interface Virtualization
Virtual Virtual Virtual Virtual
Machine
Linux Machine
Linux Machine
Linux Machine
Linux
iCAT
GPFS GPFS GPFS
SFA Driver SFA Driver SFA Driver SFA Driver
16 GB 8 GB 8GB 8GB
memory memory System memory memory memory
allocated allocated allocated allocated
RAID Processors High-Speed Cache Cache Link
Internal SAS Switching
RAID sets RAID sets with RAID sets with
with 2TB SSD 300TB SATA 30TB SAS
©2012 DataDirect Networks. All Rights Reserved.
- 9. Running Micro Services as a VM
► Since iRODS runs inside the controller we now can run
iRODS MicroServices right on top of the storage.
► The storage has become an iRODS appliance „speaking‟
iRODS natively.
► iRODS can execute “in-band” operations registering data
and extracting metadata during injest.
► We could create „hot‟ directories that kick off processing
depending on the type of incoming data.
©2012 DataDirect Networks. All Rights Reserved.
- 10. Conclusion
► The elimination of software or hardware layers increases
reliability and decreases latency.
► Automated, policy based services can be run within a
storage environment.
► Policy execution can include data migration, data
checking, or data manipulation by calling or scheduling
microservices.
Data intensive operations executed by a server can cause
network traffic and SCSI bus transaction latency. Moving
these operations to the storage is efficient and easily
managed.
11 ©2012 DataDirect Networks. All Rights Reserved.
- 11. Questions?
DataDirect Networks, Information in Motion, Silicon Storage Appliance, S2A, Storage Fusion Architecture, SFA, Storage Fusion Fabric, Web Object Scaler, WOS, EXAScaler, GRIDScaler,
xSTREAMScaler, NAS Scaler, ReAct, ObjectAssure, In-Storage Processing and SATAssure are all trademarks of DataDirect Networks. Any unauthorized use is prohibited.
12 ©2012 DataDirect Networks. All Rights Reserved.