I am an Associate Professor of Computer Science at Stony Brook University. I direct the Computer Architecture Stony Brook (COMPAS) Lab. Prior to joining Stony Brook, I completed my Ph.D. at Carnegie Mellon University (CMU) under the supervision of Babak Falsafi. While completing my dissertation, I spent several years working remotely from Ecole Polytechnique Fédérale de Lausanne (EPFL).
My research interests are in the area of computer architecture, with emphasis on the design of server systems. I work on the entire computing stack, from server software and operating systems, to networks and processor microarchitecture. My current research projects include FPGA accelerator integration into server environments (e.g., Intel HARP, Microsoft Catapult, and Amazon F1), FPGA programmability (e.g., virtual memory and high-level synthesis), accelerators for machine learning (e.g., transformers and convolutional neural networks), efficient network processing and software-defined networking, speculative performance and energy-enhancing techniques for high-performance processors, and programming models and mechanisms for emerging memory technologies (e.g., HBM and 3D XPoint).
If you are a PhD student at Stony Brook and want to work with me, please send me email to arrange an appointment.
2023 | |
[17] | Waverunner: An Elegant Approach to Hardware Acceleration of State Machine Replication , In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI), USENIX Association, 2023. [bib] [pdf] |
2021 | |
[16] | Practical Model Checking on FPGAs , In ACM Transactions on Reconfigurable Technology and Systems (TRETS), Association for Computing Machinery, volume 14, 2021. [bib] [pdf] |
[15] | Leveraging FPGA Layout to Minimize Jitter in Statistical Time-to-Digital Converters , In 29th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM, withdrawn due to IP dispute), 2021. [bib] |
[14] | On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers , In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics, 2021. [bib] [pdf] |
2020 | |
[13] | Flick: Fast and Lightweight ISA-Crossing Call for Heterogeneous-ISA Environments , In 47th International Symposium on Computer Architecture (ISCA), 2020. [bib] [pdf] |
[12] | FPGA-Accelerated Samplesort For Large Data Sets , In 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2020. [bib] [pdf] |
2019 | |
[11] | Sorting Large Data Sets with FPGA-Accelerated Samplesort , 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM poster), 2019. [bib] [pdf] |
[10] | Argus: an End-to-End Framework for Accelerating CNNs on FPGAs , In IEEE Micro, volume , 2019. [bib] [pdf] |
[9] | Runtime-Programmable Pipelines for Model Checkers on FPGAs , In 29th International Conference on Field Programmable Logic and Applications (FPL), 2019. (nominated for the Best Paper award) [bib] [pdf] |
2018 | |
[8] | Medusa: A Scalable Memory Interconnect for Many-Port DNN Accelerators and Wide DRAM Controller Interfaces , In 28th International Conference on Field Programmable Logic and Applications (FPL), 2018. [bib] [pdf] |
[7] | FPGASwarm: High Throughput Model Checking Using FPGAs , In 28th International Conference on Field Programmable Logic and Applications (FPL), 2018. [bib] [pdf] |
[6] | A Full-System VM-HDL Co-Simulation Framework for Servers with PCIe-Connected FPGAs , In 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2018. [bib] [pdf] |
2017 | |
[5] | A VM-HDL Co-Simulation Framework for Systems with PCIe-Connected FPGAs , Technical report #839, Stony Brook CEAS, 2017. [bib] [pdf] |
[4] | Maximizing CNN Accelerator Efficiency Through Resource Partitioning , In 44th International Symposium on Computer Architecture (ISCA), 2017. [bib] [pdf] |
[3] | Escher: A CNN Accelerator with Flexible Buffering to Minimize Off-Chip Transfer , In 25th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2017. [bib] [pdf] |
2016 | |
[2] | Fused-Layer CNN Accelerators , In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016. [bib] [pdf] |
[1] | Overcoming Resource Underutilization in Spatial CNN Accelerators , In 26th International Conference on Field Programmable Logic and Applications (FPL), 2016. [bib] [pdf] |
Computer architecture, with particular emphasis on the design of efficient server systems. Most recently, my main focus has been on Machine Learning Accelerators, developing hardware techniques to enable fast and efficient implementations of deep learning, and making FPGA-based accelerators more practical and easier to program. More broadly, my work seeks to understand the fundamental properties and interactions of application software, operating systems, networks, processor microarchitecture, and datacenter dynamics, to enable software and hardware co-design of high-performance, power-efficient, and compact servers.
These days, it seems like everyone's favorite hobby is to travel. Below is a map that shows the countries I visited.
If you need to speak with me, please feel free to drop by my office at any time. However, to ensure that I will be there and not busy, it's always best to send an email ahead of your visit.
If you prefer to explicitly schedule an appointment, please send me email. You can check my general availability by consulting my calendar.
March 13, 2025: The MDA funds our work toward Energy Efficient and Fault Tolerant Acceleration of Deep Neural Networks.
December 2, 2024: A Case for Hardware Memoization in Server CPUs will appear in CAL.
October 10, 2024: Ready or Not, Here I Come: Characterizing the Security of Prematurely-public Web Applications will appear at ACSAC'24.
September 6, 2024: Xipeng Shen and I will be serving as co-Program Chairs for International Conference on Supercomputing (ICS'25). Please submit your best work!
June 1, 2024: The SUNY-IBM AI Research Alliance funds our work on MLISA, an instruction set architecture extension for AI Accelerators.
April 26, 2024: Our paper NUCAlloc: Fine-Grained Block Placement in Hashed Last-Level NUCA Caches will appear at ICS'24.