I am an Associate Professor of Computer Science at Stony Brook University. I direct the Computer Architecture Stony Brook (COMPAS) Lab. Prior to joining Stony Brook, I completed my Ph.D. at Carnegie Mellon University (CMU) under the supervision of Babak Falsafi. While completing my dissertation, I spent several years working remotely from Ecole Polytechnique Fédérale de Lausanne (EPFL).
My research interests are in the area of computer architecture, with emphasis on the design of server systems. I work on the entire computing stack, from server software and operating systems, to networks and processor microarchitecture. My current research projects include FPGA accelerator integration into server environments (e.g., Intel HARP, Microsoft Catapult, and Amazon F1), FPGA programmability (e.g., virtual memory and high-level synthesis), accelerators for machine learning (e.g., transformers and convolutional neural networks), efficient network processing and software-defined networking, speculative performance and energy-enhancing techniques for high-performance processors, and programming models and mechanisms for emerging memory technologies (e.g., HBM and 3D XPoint).
If you are a PhD student at Stony Brook and want to work with me, please send me email to arrange an appointment.
|||A Case for Specialized Processors for Scale-Out Workloads |
, In IEEE Micro's Top Picks, 2014. (original at ASPLOS'12) [bib] [pdf]
|||Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors |
, In ACM Trans. Comput. Syst., ACM, volume 30, 2012. [bib] [pdf]
|||Scale-Out Processors |
, In 39th International Symposium on Computer Architecture (ISCA), 2012. [bib] [pdf]
|||Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware |
, In 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2012. (recognized as Best Paper by the program committee and recognized as Top Pick of 2013 by IEEE Micro) [bib] [pdf]
|||Proactive Instruction Fetch |
, In 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2011. [bib] [pdf]
|||Toward Dark Silicon in Servers |
, In IEEE Micro, volume 31, 2011. [bib] [pdf]
|||Cuckoo Directory: A Scalable Directory for Many-Core Systems |
, In 17th IEEE International Symposium on High Performance Computer Architecture (HPCA), 2011. (selected by the program committee for Best Student Papers session) [bib] [pdf]
|||Spatial Memory Streaming |
, In Journal of Instruction-Level Parallelism (JILP), volume 13, 2011. [bib] [pdf]
|||Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures |
, In IEEE Micro's Top Picks, volume 30, 2010. (original at ISCA'09) [bib] [pdf]
|||Making Address-Correlated Prefetching Practical |
, In IEEE Micro's Top Picks, volume 30, 2010. (original at HPCA'09) [bib] [pdf]
|||TurboTag: lookup filtering to reduce coherence directory power |
, In International Symposium on Low Power Electronics and Design (ISLPED), 2010. [bib] [pdf]
|||Reactive NUCA: near-optimal block placement and replication in distributed caches |
, In 36th International Symposium on Computer Architecture (ISCA), 2009. (recognized as Top Pick of 2009 by IEEE Micro) [bib] [pdf]
|||Practical Off-Chip Meta-Data for Temporal Memory Streaming |
, In 15th International Symposium on High Performance Computer Architecture (HPCA), 2009. (recognized as Top Pick of 2009 by IEEE Micro) [bib] [pdf]
|||Temporal Instruction Fetch Streaming |
, In 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2008. [bib] [pdf]
|||Temporal Streams in Commercial Server Applications |
, In 2008 IEEE International Symposium on Workload Characterization (IISWC), 2008. [bib] [pdf]
|||Last-Touch Correlated Data Streaming |
, In 2007 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2007. [bib] [pdf]
|||Mitigating multi-bit soft errors in L1 caches using last-store prediction |
, In Workshop on Architectural Support for Gigascale Integration (ASGI), 2007. [bib] [pdf]
|||SimFlex: Statistical Sampling of Computer System Simulation |
, In IEEE Micro, volume 26, 2006. [bib] [pdf]
Computer architecture, with particular emphasis on the design of efficient server systems. Most recently, my main focus has been on Machine Learning Accelerators, developing hardware techniques to enable fast and efficient implementations of deep learning, and making FPGA-based accelerators more practical and easier to program. More broadly, my work seeks to understand the fundamental properties and interactions of application software, operating systems, networks, processor microarchitecture, and datacenter dynamics, to enable software and hardware co-design of high-performance, power-efficient, and compact servers.
These days, it seems like everyone's favorite hobby is to travel. Below is a map that shows the countries I visited.
If you need to speak with me, please feel free to drop by my office at any time. However, to ensure that I will be there and not busy, it's always best to send an email ahead of your visit.
If you prefer to explicitly schedule an appointment, please send me email. You can check my general availability by consulting my calendar.
May 19, 2021: Congratulations to Dr. Cho and Dr. Shen for defending their PhDs!