SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
CLSF 2010




            1
Agenda
• Photos
• Attendances Overview
• Key Topics
• Industry Talks




                         2
LSF 2010, Shanghai




                     3
LSF 2010 participant/companies
    Company   # of participant              Key background

  Intel       5                  Kernel performance, SSD, mem mgmt

  EMC         5                  Storage, file system
  Fujitsu     4                  IO Controller, btrfs
  Taobao      3                  Distributed storage, taobao server
  Novell      2                  Suse server, HA
  Oracle      2                  OCFS2 dev/test
  Baidu       2                  Baidu kernel optimization
  Canonical   2
  Redhat      1                  Network driver




                                                                      4
Key topics
 Topic                     Slides?   Description
 Page writeback            Y         Dirty page ratio limit
                                     Control process to write pages

 CFQ, IO controller        Y         CFQ introduction and further features
 BTRFS                     N         Memory consuming, fsck speed
 SSD/Block layer           Y         Block layer issues with SSD
 Kernel Tracing            N         ftrace
 VFS scalability           N         Multi-core challenges
 Kernel testing            Y         Intel kernel auto test framework
 Industrial talk: Taobao   Y         TFS, Tair
 Industrial talk: Baidu    N         The architecture of Baidu search system

 Industrial talk: EMC      N         FSCK


                                                                               5
Writeback - Wu Fengguang
• vmscan is a bottleneck
         decrease dirty ratio under memory pressure, so vmscan can less possibly find
dirty pages on page allocation.
• pageout(page) calls wirtepage() to write to disk, which is a performance killer
since it does random writes
         let flusher write. expand single 4K write to 4MB write. So more dirty pages are
reclaimed and flushed.
• balance_dirty_page() should not write: random write kills performance
         let flusher write and ask process to sleep. Three proposals:
         a). wait io completion: NFS bumpy completion, need smoother sleep method.
         b). sleep (dirtied *3/2 / write_bandwidth)
         c). sleep (dirtied / throttle_bandwidth)
• flusher default write size (4MB -> 128MB), will be dynamic in the future.
         Baidu's practice: SSD random write is really bad. For sequential write, increase
wb size (4MB -> 40MB) will get 120% SSD performance




                                                                                            6
Btrfs - Coly Li
• Has too much love from linux community
         two years ago > now
• Used in MeeGo Project
• Taobao plan to push industrial deployment in 2-3 years
         10T per data server in TFS cluster
         Use on SSD and SATA hybrid data server
         Metadata will be allocated on SSD and data on SATA.
• Dynamic data relocation with hot data tracking patch.
         For generic fs usage, need to deal with device mapper to get device speed
information.
• FSCK
         A difficult must. Currently assigned to Fujitsu.




                                                                                     7
SSD challenges - Li Shaohua
• Throughput: same issue as network
• Disk controller gap and big locks (queue locks & scsi locks)
• Interrupt related:
    a. smp affinity: single queue, one CPU to deal with irqs
    b. blk_iopoll: poll more req in one req
• Need hardware multiqueue
• CFQ needs to be changed to fit multiqueue (e.g. CFQ per queue)
• Queue lock contention vs. cache lock contention
    See Andi Kleen's talk in Tokyo Linux Conf
• Intel is building nextGen PCIE SSD, with many fancy features. Stay tuned




                                                                             8
VFS scalability- Ma Tao
•   With multi-cores, all global locks suck
•   Globle icache/dcache can be adapted to per-CPU
•   CFQ can be adapted to per-queue
•   The less global locks the better




                                                     9
Industry talk – Baidu (Cont.)
• Service types
  a). Indexing: random read, high IOPS, small IO size, read-only. 80M records per data
  node, and process 8-9 K queries per second.
           b). Distributed system: large files, sequential read/write.
           c). cache/KV storage: between a and b
           d). Web Server: CPU bound.
•   For a), read() sucks. Use mmap() to read blocks adhead to void
  kernel/userspace memory copy.
  mmap() can not use page cache LRU. Call readahead() after each mmap() to mark
  pages as read.
  mmap() pagefault is expensive with mm->mmap_sem lock. Use sync_readahead()
  and sync_readaheadv()
  With above, memory is now the bottleneck. Doing 10G+ MB read.
•   Google patch for reducing mm->mmap_sem hold time
  In do_page_fault(), drop mem->sem if page not found, then read it and get the lock
  again.




                                                                                         10
Industry talk – Baidu (Cont.)
• 8K filesystem block size (ext2) with 8K page size. Alloc continuous two pages
  each time
         Get better performance for sequential IO
         OCFS2 uses 1MB fs block size and 4K page size
• PCIE compress card + ECC




                                                                                  11
Industry Talk - Taobao
• TFS
• Tair uses an update server to record updates; apply updates to production
  system during mid-night
• A config server is used to minimize meta server workload
   A versioned bucket table is maintained by config server and stored in each data
  server. Client can manipulate data location with the bucket table returned by config
  server.
• Both TFS and Tair are open source projects now




                                                                                         12
Industry Talk - EMC
• Introduce the recovery methods and check methods use in the file systems
  from ext2 to btrFS
• Emphasis the importance of FSCK; Introduce the issues within FSCK when
  checking a huge file system; Collect the proposals to solve this problem
• pNFS learning notes




                                                                             13
Q&A




      14

Mais conteúdo relacionado

Mais procurados

Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Giuseppe Paterno'
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsColleen Corrice
 
Integration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpanaIntegration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpanaGluster.org
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightGluster.org
 
Comparison between OCFS2 and GFS2
Comparison between OCFS2 and GFS2Comparison between OCFS2 and GFS2
Comparison between OCFS2 and GFS2Gang He
 
Filesystem Showdown: What a Difference a Decade Makes
Filesystem Showdown: What a Difference a Decade MakesFilesystem Showdown: What a Difference a Decade Makes
Filesystem Showdown: What a Difference a Decade MakesPerforce
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudPatrick McGarry
 
Lisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionLisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionGluster.org
 
Bridging Big - Small, Fast - Slow with Campaign Storage
Bridging Big - Small, Fast - Slow with Campaign StorageBridging Big - Small, Fast - Slow with Campaign Storage
Bridging Big - Small, Fast - Slow with Campaign Storageinside-BigData.com
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed_Hat_Storage
 
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensOpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensMatthew Ahrens
 
Introduction to BTRFS and ZFS
Introduction to BTRFS and ZFSIntroduction to BTRFS and ZFS
Introduction to BTRFS and ZFSTsung-en Hsiao
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storageGluster.org
 
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...Tommy Lee
 

Mais procurados (16)

Bluestore
BluestoreBluestore
Bluestore
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
Integration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpanaIntegration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpana
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 
Comparison between OCFS2 and GFS2
Comparison between OCFS2 and GFS2Comparison between OCFS2 and GFS2
Comparison between OCFS2 and GFS2
 
Gluster d2
Gluster d2Gluster d2
Gluster d2
 
Filesystem Showdown: What a Difference a Decade Makes
Filesystem Showdown: What a Difference a Decade MakesFilesystem Showdown: What a Difference a Decade Makes
Filesystem Showdown: What a Difference a Decade Makes
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Lisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionLisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introduction
 
Bridging Big - Small, Fast - Slow with Campaign Storage
Bridging Big - Small, Fast - Slow with Campaign StorageBridging Big - Small, Fast - Slow with Campaign Storage
Bridging Big - Small, Fast - Slow with Campaign Storage
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
 
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensOpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
 
Introduction to BTRFS and ZFS
Introduction to BTRFS and ZFSIntroduction to BTRFS and ZFS
Introduction to BTRFS and ZFS
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storage
 
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
 

Destaque

MS TechDays 2011 - WCF Web APis There's a URI for That
MS TechDays 2011 - WCF Web APis There's a URI for ThatMS TechDays 2011 - WCF Web APis There's a URI for That
MS TechDays 2011 - WCF Web APis There's a URI for ThatSpiffy
 
NFS updates for CLSF
NFS updates for CLSFNFS updates for CLSF
NFS updates for CLSFbergwolf
 
MS TechDays 2011 - SCDPM 2012 The New Feature of Data Protection
MS TechDays 2011 - SCDPM 2012 The New Feature of Data ProtectionMS TechDays 2011 - SCDPM 2012 The New Feature of Data Protection
MS TechDays 2011 - SCDPM 2012 The New Feature of Data ProtectionSpiffy
 
MS TechDays 2011 - Cloud Management with System Center Application Controller
MS TechDays 2011 - Cloud Management with System Center Application ControllerMS TechDays 2011 - Cloud Management with System Center Application Controller
MS TechDays 2011 - Cloud Management with System Center Application ControllerSpiffy
 
MS TechDays 2011 - Simplified Converged Infrastructure Solutions
MS TechDays 2011 - Simplified Converged Infrastructure SolutionsMS TechDays 2011 - Simplified Converged Infrastructure Solutions
MS TechDays 2011 - Simplified Converged Infrastructure SolutionsSpiffy
 

Destaque (8)

MS TechDays 2011 - WCF Web APis There's a URI for That
MS TechDays 2011 - WCF Web APis There's a URI for ThatMS TechDays 2011 - WCF Web APis There's a URI for That
MS TechDays 2011 - WCF Web APis There's a URI for That
 
NFS updates for CLSF
NFS updates for CLSFNFS updates for CLSF
NFS updates for CLSF
 
MS TechDays 2011 - SCDPM 2012 The New Feature of Data Protection
MS TechDays 2011 - SCDPM 2012 The New Feature of Data ProtectionMS TechDays 2011 - SCDPM 2012 The New Feature of Data Protection
MS TechDays 2011 - SCDPM 2012 The New Feature of Data Protection
 
MS TechDays 2011 - Cloud Management with System Center Application Controller
MS TechDays 2011 - Cloud Management with System Center Application ControllerMS TechDays 2011 - Cloud Management with System Center Application Controller
MS TechDays 2011 - Cloud Management with System Center Application Controller
 
logfs
logfslogfs
logfs
 
Advanced Directory Services Windows Server 2012
Advanced Directory Services Windows Server 2012Advanced Directory Services Windows Server 2012
Advanced Directory Services Windows Server 2012
 
MS TechDays 2011 - Simplified Converged Infrastructure Solutions
MS TechDays 2011 - Simplified Converged Infrastructure SolutionsMS TechDays 2011 - Simplified Converged Infrastructure Solutions
MS TechDays 2011 - Simplified Converged Infrastructure Solutions
 
O365 strategy
O365 strategyO365 strategy
O365 strategy
 

Semelhante a CLSF 2010 Agenda and Key Topics

Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...In-Memory Computing Summit
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheDavid Grier
 
Ceph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Community
 
Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Community
 
Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Community
 
Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage Ceph Community
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Community
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationBigstep
 
Storage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailStorage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailInternet World
 
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Community
 
SDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptxSDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptxssuserabc741
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화OpenStack Korea Community
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions Alfresco Software
 

Semelhante a CLSF 2010 Agenda and Key Topics (20)

Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
 
LUG 2014
LUG 2014LUG 2014
LUG 2014
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
Ceph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash Storage
 
Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage
 
Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage Ceph Day KL - Ceph on All-Flash Storage
Ceph Day KL - Ceph on All-Flash Storage
 
Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
Ceph
CephCeph
Ceph
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
Storage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, WhiptailStorage and performance- Batch processing, Whiptail
Storage and performance- Batch processing, Whiptail
 
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
 
SDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptxSDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptx
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
 
Ceph on arm64 upload
Ceph on arm64   uploadCeph on arm64   upload
Ceph on arm64 upload
 

Mais de bergwolf

Google Megastore
Google MegastoreGoogle Megastore
Google Megastorebergwolf
 
vmfs intro
vmfs introvmfs intro
vmfs introbergwolf
 
pnfs status
pnfs statuspnfs status
pnfs statusbergwolf
 
linux trim
linux trimlinux trim
linux trimbergwolf
 
network filesystem briefs
network filesystem briefsnetwork filesystem briefs
network filesystem briefsbergwolf
 
gsoc and grub4ext4
gsoc and grub4ext4gsoc and grub4ext4
gsoc and grub4ext4bergwolf
 
grub4ext4 status-plans
grub4ext4 status-plansgrub4ext4 status-plans
grub4ext4 status-plansbergwolf
 

Mais de bergwolf (9)

Linux aio
Linux aioLinux aio
Linux aio
 
RCU
RCURCU
RCU
 
Google Megastore
Google MegastoreGoogle Megastore
Google Megastore
 
vmfs intro
vmfs introvmfs intro
vmfs intro
 
pnfs status
pnfs statuspnfs status
pnfs status
 
linux trim
linux trimlinux trim
linux trim
 
network filesystem briefs
network filesystem briefsnetwork filesystem briefs
network filesystem briefs
 
gsoc and grub4ext4
gsoc and grub4ext4gsoc and grub4ext4
gsoc and grub4ext4
 
grub4ext4 status-plans
grub4ext4 status-plansgrub4ext4 status-plans
grub4ext4 status-plans
 

Último

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

CLSF 2010 Agenda and Key Topics

  • 2. Agenda • Photos • Attendances Overview • Key Topics • Industry Talks 2
  • 4. LSF 2010 participant/companies Company # of participant Key background Intel 5 Kernel performance, SSD, mem mgmt EMC 5 Storage, file system Fujitsu 4 IO Controller, btrfs Taobao 3 Distributed storage, taobao server Novell 2 Suse server, HA Oracle 2 OCFS2 dev/test Baidu 2 Baidu kernel optimization Canonical 2 Redhat 1 Network driver 4
  • 5. Key topics Topic Slides? Description Page writeback Y Dirty page ratio limit Control process to write pages CFQ, IO controller Y CFQ introduction and further features BTRFS N Memory consuming, fsck speed SSD/Block layer Y Block layer issues with SSD Kernel Tracing N ftrace VFS scalability N Multi-core challenges Kernel testing Y Intel kernel auto test framework Industrial talk: Taobao Y TFS, Tair Industrial talk: Baidu N The architecture of Baidu search system Industrial talk: EMC N FSCK 5
  • 6. Writeback - Wu Fengguang • vmscan is a bottleneck decrease dirty ratio under memory pressure, so vmscan can less possibly find dirty pages on page allocation. • pageout(page) calls wirtepage() to write to disk, which is a performance killer since it does random writes let flusher write. expand single 4K write to 4MB write. So more dirty pages are reclaimed and flushed. • balance_dirty_page() should not write: random write kills performance let flusher write and ask process to sleep. Three proposals: a). wait io completion: NFS bumpy completion, need smoother sleep method. b). sleep (dirtied *3/2 / write_bandwidth) c). sleep (dirtied / throttle_bandwidth) • flusher default write size (4MB -> 128MB), will be dynamic in the future. Baidu's practice: SSD random write is really bad. For sequential write, increase wb size (4MB -> 40MB) will get 120% SSD performance 6
  • 7. Btrfs - Coly Li • Has too much love from linux community two years ago > now • Used in MeeGo Project • Taobao plan to push industrial deployment in 2-3 years 10T per data server in TFS cluster Use on SSD and SATA hybrid data server Metadata will be allocated on SSD and data on SATA. • Dynamic data relocation with hot data tracking patch. For generic fs usage, need to deal with device mapper to get device speed information. • FSCK A difficult must. Currently assigned to Fujitsu. 7
  • 8. SSD challenges - Li Shaohua • Throughput: same issue as network • Disk controller gap and big locks (queue locks & scsi locks) • Interrupt related: a. smp affinity: single queue, one CPU to deal with irqs b. blk_iopoll: poll more req in one req • Need hardware multiqueue • CFQ needs to be changed to fit multiqueue (e.g. CFQ per queue) • Queue lock contention vs. cache lock contention See Andi Kleen's talk in Tokyo Linux Conf • Intel is building nextGen PCIE SSD, with many fancy features. Stay tuned 8
  • 9. VFS scalability- Ma Tao • With multi-cores, all global locks suck • Globle icache/dcache can be adapted to per-CPU • CFQ can be adapted to per-queue • The less global locks the better 9
  • 10. Industry talk – Baidu (Cont.) • Service types a). Indexing: random read, high IOPS, small IO size, read-only. 80M records per data node, and process 8-9 K queries per second. b). Distributed system: large files, sequential read/write. c). cache/KV storage: between a and b d). Web Server: CPU bound. • For a), read() sucks. Use mmap() to read blocks adhead to void kernel/userspace memory copy. mmap() can not use page cache LRU. Call readahead() after each mmap() to mark pages as read. mmap() pagefault is expensive with mm->mmap_sem lock. Use sync_readahead() and sync_readaheadv() With above, memory is now the bottleneck. Doing 10G+ MB read. • Google patch for reducing mm->mmap_sem hold time In do_page_fault(), drop mem->sem if page not found, then read it and get the lock again. 10
  • 11. Industry talk – Baidu (Cont.) • 8K filesystem block size (ext2) with 8K page size. Alloc continuous two pages each time Get better performance for sequential IO OCFS2 uses 1MB fs block size and 4K page size • PCIE compress card + ECC 11
  • 12. Industry Talk - Taobao • TFS • Tair uses an update server to record updates; apply updates to production system during mid-night • A config server is used to minimize meta server workload A versioned bucket table is maintained by config server and stored in each data server. Client can manipulate data location with the bucket table returned by config server. • Both TFS and Tair are open source projects now 12
  • 13. Industry Talk - EMC • Introduce the recovery methods and check methods use in the file systems from ext2 to btrFS • Emphasis the importance of FSCK; Introduce the issues within FSCK when checking a huge file system; Collect the proposals to solve this problem • pNFS learning notes 13
  • 14. Q&A 14