Main Conference Program


Day 1: September 23rd, Monday
Keynote – Dr. Kathryn McKinley
Session 1: Best-Papers Session
  MASR: A Modular Accelerator for Sparse RNNs
Udit Gupta, Brandon Reagen, Lillian Pentecost, Marco Donato, Thierry Tambe, Alexander Rush, Gu-Yeon Wei, David Brooks
  Gluon-Async: A Bulk-Asynchronous System for Distributed and Heterogeneous Graph Analytics
Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang-Vu Dang, Vishwesh Jatala, V Krishna Nandivada, Marc Snir, Keshav Pingali
  Optimizing OpenMP Nested Parallel Regions with User-Level Threads
Shintaro Iwasaki, Halim Amer, Kenjiro Taura, Sangmin Seo, Pavan Balaji
  SMT-COP: Defeating Side-Channel Attacks on Execution Units in SMT Processors
Daniel Townley, Dmitry Ponomarev
Session 2A: Compiler Optimization and Code Generation 1
  Type-Directed Program Synthesis and Constraint Generation for Library Portability
Bruce Collie, Philip Ginsbach, Michael F. P. O’Boyle
  Deepframe: A Profile-driven Compiler for Spatial Hardware Accelerators
Apala Guha, Naveen Vedula, Arrvindh Shriraman
  Fast Parallel Equivalence Relations in a Datalog Compiler
Patrick Nappa, David Zhao, Pavle Subotic, Bernhard Scholz
  Session 2B: Memory/Storage Systems 1
  Enforcing Last-level Cache Partitioning through Memory Virtual Channels
Jungwook Chung, Yuhwan Ro, Joonsung Kim, Jaehyung Ahn, Jangwoo Kim, John Kim, Jae W. Lee, Jung Ho Ahn
  To Stack or Not to Stack
Richard Afoakwa, Lejie Lu, Hui Wu, Michael Huang
  Enforcing Crash Consistency of Evolving Network Analytics in Non-Volatile Main Memory Systems
Soklong Lim, Zaixin Lu, Bin Ren, Xuechen Zhang
Session 3A: Hardware/Software for Security
  Fooling the Sense of Cross-core Last-level Cache Eviction based Attacker by Prefetching Common Sense
Biswabandan Panda
  SpecShield: Shielding Speculative Data from Microarchitectural Covert Channels
Kristin Barber, Anys Bacha, Li Zhou, Yinqian Zhang, Radu Teodorescu
  Session 3B: Hardware/Software for Machine Learning
  MOSAIC: Heterogeneity-, Communication-, and Constraint-Aware Model Slicing and Execution for Accurate and Efficient Inference
Myeonggyun Han, Jihoon Hyun, Seongbeom Park, Jinsu Park, Woongki Baek
  Acorns: A Framework for Accelerating Deep Neural Networks with Input Sparsity
Xiao Dong, Lei Liu, Xiaobing Feng
Day 2: September 24th, Tuesday
Keynote – Dr. Luis Ceze
Session 4A: Concurrency Management
  Forgive-TM: Supporting Lazy Conflict Detection In Eager Hardware Transactional Memory
Sunjae Park, Christopher Hughes, Milos Prvulovic
  Unfair Scheduling Patterns in NUMA Architectures
Naama Ben-David, Ziv Scully, Guy E. Blelloch
  Optimizing Persistent Memory Transactions
Pantea Zardoshti, Tingzhe Zhou, Yujie Liu, Michael Spear
  Session 4B: Heterogeneous Systems and Accelerators 1
  HeTM: Transactional Memory for Heterogeneous Systems
Daniel Castro, Paolo Romano, Aleksandar Ilic, Amin M Khan
  Achieving scalability in a k-NN multi-GPU network service with Centaur
Amir Wated, Alexander Libov, Ohad Shacham, Edward Bortnikov, Mark Silberstein
  Analyzing and Leveraging Remote-core Bandwidth for Enhanced Performance in GPUs
Mohamed Ibrahim, Hongyuan Liu, Onur Kayiran, Adwait Jog
Session 5A: Domain/Application-Specific Hardware/Software
  Specialization Opportunities in Graphical Workloads
Lewis Crawford, Michael O’Boyle
  FindeR: Accelerating FM-Index-based Exact Pattern Matching in Genomic Sequences through ReRAM technology
Farzaneh Zokaee, Mingzhe Zhang, Lei Jiang
  SLAMBooster: An Application-aware Online Controller for Approximation in Dense SLAM
_ Yan Pei, Swarnendu Biswas, Donald Fussell, Keshav Pingali_
  Session 5B: Heterogeneous Systems and Accelerators 2
  Exploring Memory Persistency Models for GPUs
Zhen Lin, Mohammad Alshboul, Yan Solihin, Huiyang Zhou
  Adaptive Task Aggregation for High-Performance Sparse Solvers on GPUs
Ahmed E. Helal, Ashwin M. Aji, Michael L. Chu, Bradford M. Beckmann, Wu-chun Feng
  EDGE: Event-Driven GPU Execution
Tayler Hetherington, Maria Lubeznov, Deval Shah, Tor M. Aamodt
Session 6A: Compiler Optimization and Code Generation 2
  Generating Portable High-Performance Code via Multi-Dimensional Homomorphisms
Ari Rasch, Richard Schulze, Sergei Gorlatch
  Absinthe: Learning an Analytical Performance Model to Fuse and Tile Stencil Codes in One Shot
Tobias Gysi, Tobias Grosser, Torsten Hoefler
  Session 6B: Memory/Storage Systems 2
  Reducing Data Movement and Energy in Multilevel Cache Hierarchies without Losing Performance: Can you have it all?
Jiajun Wang, Prakash Ramrakhyani, Wendy Elsasser, Lizy John
  Multiversioned Page Overlays: Enabling Faster Serializable Hardware Transactional Memory
Ziqi Wang, Vivek Seshadri, Todd Mowry, Michael Kozuch
Day 3: September 25th, Wednesday
Keynote – Dr. Peter Kogge
Session 7: Parallel Algorithms and Applications
  Computing Three-dimensional Constrained Delaunay Refinement Using the GPU
Zhenghai Chen, Tiow-Seng Tan
  A Synchronization-Avoiding Distance-1 Grundy Coloring Algorithm for Power-Law Graphs
Jesun Sahariar Firoz, Marcin Zalewski, Andrew Lumsdaine
  Accelerating DCA++ (Dynamical Cluster Approximation) Scientific Application on the Summit supercomputer
Giovanni Balduzzi, Arghya Chatterjee, Ying Wai Li, Peter Doak, Urs Haehner, Ed D’Azevedo, Thomas Maier, Thomas Schulthess
  A Methodology for Characterizing Sparse Datasets and Its Application to SIMD Performance Prediction
Gangyi Zhu, Peng Jiang, Gagan Agrawal
Student Research Competition (SRC) Poster Session