ASPLOS10&Vee10 report-suzaki

ACM ASPLOS’10 & Vee’10 Report

22 装
at 22回仮想化実装技術勉強会(vimpl)
(vimpl)
2010/April/20

須崎有康

概要
• Fifteenth International Conference on Architectural Support for
Programming Languages and Operating Systems (ASPLOS 2010)
– March 15-17, 2010
– Pittsburgh, PA
– 182Submit (今までの最高)、Accept 32（18%）、Best Paper 3本
• ポスターあり。日本から5件(東大平木研、早稲田中島研2件、九大村上研、九工大光来研)
– 参加者400名程度。
– Keynote SpeechはACM InfoSys Foundation Award の Eric Brewer (UCB)

• ワークショップ
– 2nd WIOV (Workshop I/O Virtualization)
– Workshop on Architecting Memory Technologies (これはパネルでした)
– 参加していないが Workshop on General-Purpose Computation on Graphics
Processing Units

• ASPLOS 2011はNewport Beach, California, March 5 ~ 11, 2011
– asplos11.cs.ucr.edu/
– Abstract Deadline: Monday, July 19, 2010
– Full Paper Deadline: Monday, July 26, 2010 (11:59pm EDT)

プログラム１日目
• Session 1: Novel Architectures (Session Chair: Luis Ceze)
– Best Paper! Dynamically Replicated Memory: Building Reliable Systems from Nanoscale Resistive
Memories
• Engin Ipek, Jeremy Condit, Edmund B. Nightingale, Doug Burger and Thomas Moscibroda (University of Rochester / Microsoft Research)
– A Power-efficient All-optical On-chip Interconnect Using Wavelength-based Oblivious Routing
• Nevin Kirman and Jose Martinez (Cornell University)
• Session 2: Compilers and Runtime Systems (Session Chair: Michael Hind)
– Best Paper! A Real System Evaluation of Hardware Atomicity for Software Speculation
• Naveen Neelakantam, David Ditzel and Craig Zilles (University of Illinois at Urbana-Champaign; Intel)
– Dynamic filtering: multi-purpose architecture support for language runtime systems
• Tim Harris, Adrian Cristal, Sasa Tomic and Osman Unsal (Microsoft Research)
• Session 3: Parallel Programming 1 (Session Chair: Yuanyuan Zhou)
– CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution
• Tom Bergan, Owen Anderson, Joe Devietti, Luis Ceze and Dan Grossman (University of Washington)
– Speculative Parallelization Using Software Multi-threaded Transactions,
• Arun Raman, Hanjun Kim, Thomas R. Mason, Thomas B. Jablin and David I. August (Princeton University)
– Respec: Efficient online multiprocessor replay via speculation and external determinism
• Dongyoon Lee, Benjamin Wester, Kaushik Veeraraghavan, Satish Narayanasamy, Peter Chen and Jason Flinn (University of Michigan)
• Session 4: Scheduling in Parallel Systems (Session Chair: Tim Harris)
– Probabilistic Job Symbiosis Modeling for SMT Processor Scheduling
• Stijn Eyerman and Lieven Eeckhout (Ghent University)
– Request Behavior Variations
• Kai Shen (University of Rochester)
– Decoupling contention management from scheduling
• Ryan Johnson, Radu Stoica, Anastasia Ailamaki and Todd Mowry (EPFL; Carnegie Mellon University)
– Addressing Shared Resource Contention in Multicore Processors Via Scheduling
• Sergey Zhuravlev, Sergey Blagodurov and Alexandra Fedorova (Simon Fraser University)

プログラム2日目 (1/2)
• Session 5. Software Reliability (Session Chair: Emery Berger)
– SherLog: Error Diagnosis by Connecting Clues from Run-time Logs
• Ding Yuan, Haohui Mai, Weiwei Xiong, Lin Tan, Yuanyuan Zhou and Shankar Pasupathy (University of California, San Diego;
University of Illinois at Urbana-Champaign)

– Analyzing Multicore Dumps to Facilitate Concurrency Bug Reproduction
• Dasarath Weeratunge, Xiangyu Zhang and Suresh Jagannathan (Purdue University)

– A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs
• Sebastian Burckhardt, Pravesh Kothari, Madanlal Musuvathi and Santosh Nagarakatte (Microsoft Research)

– ConMem: Detecting Severe Concurrency Bugs Through an Effect-Oriented Approach
• Wei Zhang, Chong Sun and Shan Lu (University of Wisconsin- Madison)

• Session 6. Hardware Power and Energy (Session Chair: David Wood)
– Characterizing Processor Thermal Behavior
• Francisco J. Mesa-Martínez, Ehsan K. Ardestani and Jose Renau (University of California, Santa Cruz)

– Conservation Cores: Reducing the Energy of Mature Computations
• Ganesh Venkatesh, John Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steve Swanson
and Michael Taylor (University of California, San Diego)

– Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement
• Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian and Al Davis (University of Utah)

プログラム2日目 (2/2)
• Session 7. Data Centers (Session Chair: Scott Mahlke)
– Power Routing: Dynamic Power Provisioning in the Data Center
• Steven Pelley, David Meisner, Pooya Zandevakili, Jack Underwood and Thomas Wenisch (University of Michigan)

– Joint Optimization of Idle and Cooling Power in Data Centers While Maintaining
Response Time
• Faraz Ahmad and T. N. Vijaykumar (Purdue University)

• Session 8. Hardware Monitoring (Session Chair: Peter Chen)
– Butterfly Analysis: Adapting Dataflow Analysis to Dynamic Parallel Monitoring
• Michelle Goodstein, Evangelos Vlachos, Shimin Chen, Phillip Gibbons, Michael Kozuch and Todd Mowry (Carnegie Mellon
University; Intel Labs Pittsburgh)

– ParaLog: Enabling and Accelerating Online Parallel Monitoring of Multithreaded
Applications
• Evangelos Vlachos, Michelle Goodstein, Michael Kozuch, Shimin Chen, Babak Falsafi, Phillip Gibbons and Todd Mowry (Carnegie
Mellon University; Intel Labs Pittsburgh; EPFL)

• Session 9. Parallel Programming 2 (Session Chair: Tim Harris)
– MacroSS: Macro-SIMDization of Streaming Applications
• Amir Hormati, Yoonseo Choi, Mark Woh, Manjunath Kudlur, Rodric Rabbah, Trevor Mudge and Scott Mahlke (University of
Michigan)

– COMPASS: A Programmable Data Prefetcher Using Idle GPU Shaders
• Dong Hyuk Woo and Hsien-Hsin Lee (Georgia Institute of Technology)

– Flexible Architectural Support for Fine-grain Scheduling
• Daniel Sanchez, Richard Yoo and Christos Kozyrakis (Stanford University)

プログラム3日目
• Session 10. Parallel Memory Systems (Session Chair: Carl Waldspurger)
– Specifying and Dynamically Verifying Address Translation-Aware Memory Consistency
• Bogdan Romanescu, Alvin Lebeck and Daniel Sorin (Duke University)

– Best Paper! Fairness via Source Throttling: A Configurable and High-Performance
Fairness Substrate for Multi-Core Memory Systems
• Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu and Yale Patt (The University of Texas at Austin)

– An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel
Systems
– Isaac Gelado, Javier Cabezas, John Stone, Sanjay Patel, Nacho Navarro and Wen-mei Hwu (University of Illinois at Urbana-
Champaign; UPC)

– Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors
• Abhishek Bhattacharjee and Margaret Martonosi (Princeton University)
• Session 11. Security and Hardware Reliability (Session Chair: Vikram Adve)
– Orthrus: Efficient Software Integrity Protection on Multi-Cores
• Ruirui Huang, Dan Deng and G. Edward Suh (Cornell University)

– Shoestring: Probabilistic Soft-error Resilience on the Cheap
• Shuguang Feng, Shantanu Gupta, Amin Ansari and Scott Mahlke (University of Michigan)

– Virtualized and Flexible ECC for Main Memory
• Doe Hyun Yoon and Mattan Erez (The university of Texas at Austin)

Dynamically Replicated Memory: Building Reliable
Systems from Nanoscale Resistive Memories
Engin Ipek, Jeremy Condit, Edmund B. Nightingale, Doug Burger and Thomas Moscibroda
(University of Rochester / Microsoft Research)

• 次期メインメモリであるPCM(Phase Change Memory)の利用法
– 40n scale以下で作成でき高密度だが、一旦壊れると修復できない
– 壊れたページ(primary)はbackupページを用意してリカバー
– Physical -> Real 変換でPrimary とbackupのマッピングを行う

Primary Backup
page page

Xはdead byte. ここはparity
が壊れていることで判断

Dynamic filtering: multi-purpose architecture support for
language runtime systems
Tim Harris, Adrian Cristal, Sasa Tomic and Osman Unsal (Microsoft Research)

• メモリアクセス確認するread/write barrier命令である”dyfl”を追加す
ることでGC, Software Transactional Memory, Control&Data
Flow Integrity (XFI[OSDI06],WIT[SP08], DFI[OSDI06])を効率化

GCで使われるWrite Barrier dflyを追加したWrite Barrier
void writeBarrier(void **addr, void *tgt) { void writeBarrierDyfl(void **addr, void *tgt) {
if (inOldGen(addr) && inYoungGen(tgt)) { // T1 if ((!dyfl_card_pair(addr, tgt, 0x1)) && // A1
log(addr); // L1 (!dyfl_addr(addr, 0x2))) { // A2
}} if (inOldGen(addr) && inYoungGen(tgt)) { // T1
dyfl_set_addr(addr, 0x2); // S2
T がtest, Lがlog, Sがset, A がaddress log(addr); // L1
} else {
dyfl_set_card_pair(addr, tgt, 0x1); // S1
}}}

dyfl(i1, i2, mask, tag) // Test dynamic filter
dyfl_set(i1, i2, mask, tag) // Set dynamic filter
dyfl_clear(i1, i2, mask, tag) // Clear specific entry
dyfl_clear(tag) // Clear all with tag

疑問：hardware break pointと違うのか？

Micro-Pages: Increasing DRAM Efficiency
with Locality-Aware Data Placement
Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian
and Al Davis (University of Utah)
• 動機：MultiCoreにより細かいメモリアクセスになっている。DRAMのRow Buffer 8KBのヒ
ット率が低くなっている。下図 64byte cache block
• アクセスが多いデータを見つけ、ヒット率が高くなるようにデータを移動する(hardware
assist migration)
• OSのページサイズを1KBとし、4KB SuperPage(プロセッサのTLBにおけるページ粒度可
変機構)を使う
– 参考文献「2.6 系カーネルに対するLinux Super Page
2.6 Linux Pageの実装と性能評価」 http://shimizu-lab.dt.u-tokai.ac.jp/thesis/master/6adgm007.pdf

•Average performance ↑ 9% (max. 18%)
•Average memory energy consumption ↓ 18% (max. 62%).
•Average row-buffer utilization ↑ 38%

Orthrus: Efficient Software Integrity
Protection on Multi-Cores
Ruirui Huang, Dan Deng and G. Edward Suh (Cornell University)

• 細粒度のメモリレイアウトが異なるレプリカプロセスを作成。
• ２つのプロセスの実行で、メモリアクセスが同一コンテンツ(異なるアドレス)を
しているかを検査することでBuffer OverflowやDangling Pointer検出
– Orthrus(オルトロス)はギリシャ神話の双頭の犬。ケルベロスの兄弟。

類似研究：どちらともソースコードを公開している
Diehard [PLDI06] http://prisms.cs.umass.edu/emery/
N-variant [USENIX-Security06] http://www.cs.virginia.edu/nvariant/

Virtualized and Flexible ECC for Main Memory
Doe Hyun Yoon and Mattan Erez (The university of Texas at Austin)

• 通常ECC用にCheck Bitが付加されているが、このcheck bitを
仮想化(Tire1 シンプル, Tire2 ストロング)し、通常のメモリ空間
にマップできるようにする。
– 利点：Bit増加を抑制する。省電力化
• DIMM(DDR2 burst4)の構成に合わせて、
– x4 DDR2 burst 4 の場合、64bit -> 4B T1EC
– x8 DDR2 burst 4 の場合、64bit -> 8B T1EC
• T2はchipkill correntを採用

感想・傾向
• 当然だが、OS＆最新ハードやDebugger+最新ハードを絡めた
ものが採択されている。
• 最新ハードもメモリがらみが多かった。

WIOV 2009
Second Workshop on I/O Virtualization
• 参加人数 30名程度。全員自己紹介
• Storage
– SLIM: Network Decongestion for Storage Systems
• Madalin Mihailescu, Gokul Soundararajan and Cristiana Amza (University of Toronto).
– On Disk I/O Scheduling in Virtual Machines
• Mukil Kesavan, Ada Gavrilovska and Karsten Schwan (Georgia Institute of Technology).
• Networking
– Ally: OS-Transparent Packet Inspection Using Sequestered Cores
• Jen-Cheng Huang (Georgia Tech), Matteo Monchiero and Yoshio Turner (HP Labs).
– A Network Interface Card Architecture for I/O Virtualization in Embedded Systems
• Holm Rauchfuss, Thomas Wild and Andreas Herkersdorf (Technische Universitat Munchen).
– Architectural support for user-level network interfaces in heavily virtualized systems
• Florian Auernhammer and Patricia Sagmeister (IBM Research).
• Keynote by Paul Congdon (HP)
– Enabling Truly Converged Instrastructure
• Power and Performance Bottlenecks
– Redesigning Xen's Memory Sharing Mechanism for Safe and Efficient I/O
Virtualization
• Kaushik Kumar Ram (Rice University), Jose Renato Santos and Yoshio Turner (HP Labs).
– Power Aware I/O Virtualization
• Kun Tian and Yaozu Dong (Intel).
– I/O Virtualization Bottlenecks in Cloud Computing Today
• Jeffrey Shafer (Rice University).
• HP: http://sysrun.haifa.il.ibm.com/hrl/wiov2010/
– スライドが公開されている

Enabling Truly Converged Instrastrucutre
Keynote by Paul Congdon (HP)
• 現在進んでいるネットワーク仮想化の規格紹介
– HyperVisorでI/O仮想化のためにCPUの負荷が大きい。
– アダプタ仮想化
• I/Oの仮想化をハードで行う
– PCI-SIGで規格化
» SR-IOV :Single Root I/O virtualization
– エッジ仮想化
• スイッチの仮想化をハードで行う
– IEEE 802.Qbg 802.Qbhで規格化
» VEB: Virtual Ethernet Bridge
» VEPA: Virtual Ehternet Port Aggregator
• 参考文献日経コンピュータ 2010/03/31
• ネットワーク仮想化裏で支えるネットワークの新規格

Workshop on Architecting Memory Technologies
• 司会: Shih-Lien Lu, Intel Labs
• Professor Mattan Erez, University of Texas at Austin
• Professor Bruce Jacob, University of Maryland
• Professor Hsien-Hsin Lee, Georgia Tech University
• Professor Onur Mutlu, Carnegie Mellon University
• Professor Yuan Xie, Pennsylvania State University
– HP: http://web.engr.oregonstate.edu/~sllu/asplos2010 スライド公開

• 不揮発RAMへの移行、電力消費の問題、マルチコアの競合による性能低

• コアに対する最適ストレージサイズ
– Mattn Erez (Texas Austin)

ＦIT （Failure In Time) は故障率の表記方法として使用されます。そ
の単位は10億時間に発生する故障件数で表記されます。例えば、10
億時間に、故障が3件発生したとすると、その故障率（FIT）は3となり
ます。一般的な電子部品は、FITが10-100程度となります。故障率の
合計がシステム全体の故障率になるため、部品数が多くなればなる
ほど、故障率が上昇します

Vee Day1
• Keynote Talk “Transistors to Toys: Teaching Systems to
Freshmen”
– Peter M. Chen (University of Michigan)
• Debugging and Replay
– Capability Wrangling Made Easy: Debugging on a Microkernel with
Valgrind
• Aaron Pohle (Technische Universität Dresden), Björn Döbel, Michael
Roitzsch, Hermann Härtig
– Multi-Stage Replay with Crosscut
• Jim Chow, Dominic Lucchetti,Tal Garfinkel, Geoffrey Lefebvre,Ryan Gardner,Joshua
Mason, Sam Small, Peter M. Chen (University of Michigan)
– Optimizing Crash Dump in Virtualized Environments
• Yijian Huang (Fudan University), Haibo Chen, Binyu Zang

Vee Day2
• Keynote Talk, “Looking Beyond a Singularity”
– Galen C. Hunt (Microsoft Research)
• Compiler Infrastructure
– Improving Compiler-Runtime Separation with XIR
• Ben L. Titzer (Google), Thomas Würthinger, Doug Simon, Marcelo Cintra
– VMKit: A Substrate for Managed Runtime Environments
• Nicolas Geoffray (Université Pierre et Marie Curie),Gaël Thomas, Julia Lawall , Gilles Muller , Bertil Folliot

• Featured Talk “Spice up your browser: NaCl, Pepper, and beyond”
– Robert Muth (Google)
• Applications of Virtualization
– Neon: System Support for Derived Data Management
• QiUniversity of California, San Diego), John McCullough, Justin Ma, Nabil Schear, Michael Vrable (University of
California, San Diego), Amin Vahdat, Alex C. Snoeren, Geoffrey M. Voelker, Stefan Savage

– Energy-Efficient Storage in Virtual Machine ng Zhang (Environments
• Lei Ye (University of Arizona), Gen Lu, Sushanth Kumar, Chris Gniady, John H. Hartman

• Hypervisor Scheduling
– AASH: An Asymmetry-Aware Scheduler for Hypervisors
• Vahid Kazempour , Ali Kamali , Alexandra Fedorova (Simon Fraser University)
– Supporting Soft Real-Time Tasks in the Xen Hypervisor
• Min Lee (Georgia Institute of Technology), A. S. Krishnakumar (Avaya Laboratories), P. Krishnan
, Navjot Singh, Shalini Yajnik

Vee Day3
• Java
– Efficient Runtime Tracking of Allocation Sites in Java
• Rei Odaira (IBM Research - Tokyo), Kazunori Ogata, Kiyokuni Kawachiya,
Tamiya Onodera (IBM Research - Tokyo), Toshio Nakatani
– Evaluation of a Just-In-Time Compiler Retrofitted for PHP
• Michiaki Tatsubori (IBM Research - Tokyo), Akihiko Tozawa, Toyotaro
Suzumura, Scott Trent, Tamiya Onodera,
– Novel Online Profiling for Virtual Machines
• Manjiri A. Namjoshi (University of Kansas), Prasad A. Kulkarni
• Dynamic Binary Translation
– DBT Path Selection for Holistic Memory Efficiency and Performance
• Apala Guha (University of Virginia), Kim Hazelwood, Mary Lou Soffa
– Dynamic Binary Translation Specialized for Embedded Systems
• Goh Kondoh (IBM Research - Tokyo), Hideaki Komatsu

“Looking Beyond a Singularity”
Galen C. Hunt (Microsoft Research)
• Singularityの３つのkey
– Software Isolated Processes (SIP)
– Contract-Based Channels
– Manifest-Based Programs

• Singularityの後継プロジェクト
– Menlo 認知されないモバイルデバイス
– Drawbridge サンドボックス
– SafeOS アッセンブリを検証
– BTL 静的解析と動的解析の融合

Capability Wrangling Made Easy: Debugging on a
Microkernel with Valgrind
Aaron Pohle (Technische Universität Dresden), Björn Döbel, Michael Roitzsch, Hermann Härtig

• L4系マイクロカーネル Fiasco.OCにValgrindを移植する方法
• メモリ管理が異なるので整合性を取る仕組みが必要
– Valgrind ではapplication(Client)のメモリ空間をValgirndが可能。OSの
インターフェースはPOSIX
– Fiasco.OCではCapabilityベース
Fiasco.OCではCapability ス

• Valgrindを使ったCapCheckによりCapabilityの移譲を検査で
きるようになった

AASH: An Asymmetry-Aware Scheduler for Hypervisors
Vahid Kazempour , Ali Kamali , Alexandra Fedorova (Simon Fraser University)

• 非対称マルチコア(同一ISA。Fast CoreとSlow Coreの
２種類)に対するHypervisorのスケジューラの提案
– 基本：
• Fast Coreは公平に割り当てる
• ゲスト内の構成は認識する
– Fast CoreのスレッドスケジュールはOSの仕事ゲスト内認識

• Fast Core割り当てのプライオリティあり

– Fast Coreが空いている場合にはSlow Coreより優先して割り
当てる
– MSR (Model Specification Register)を使ってゲストOSに
Coreの変更を伝えることは今後の課題

AASH: An Asymmetry-Aware Scheduler for Hypervisors

• 実装
– Xen3.0のCredit Schdulerを改良
– 4 Core AMD Opteron を2つ（計8コア）
• Fast Core 2GHz 1個、Slow Core 1GHz 7個
• DVFS(Dynamic Voltage and Frequency Scaling)で設定？

• 評価
– Xenオリジナルなスケジューラより、36%良い結果
がでた。

ASPLOS10&Vee10 report-suzaki

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a ASPLOS10&Vee10 report-suzaki

Semelhante a ASPLOS10&Vee10 report-suzaki (20)

Mais de Kuniyasu Suzaki

Mais de Kuniyasu Suzaki (20)

Último

Último (20)

ASPLOS10&Vee10 report-suzaki