Publications

2026

ASPLOS

STARC: Selective Token Access with Remapping and Clustering for Efficient LLM Decoding on PIM Systems

Zehao Fan, Yunzhen Liu, Garrett Gagnon, and 5 more authors

In Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026

2025

ICCAD

SAGE: Saliency-Aware Grouping for Efficient Mapping of LLMs on Analog Compute-in-Memory

Yayue Hou, Zhenyu Liu, Garrett Gagnon, and 4 more authors

In Proceedings of International Conference on Computer-Aided Design, 2025
CAL

Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System

Yunhua Fang, Rui Xie, Asad Ul Haq, and 6 more authors

IEEE Computer Architecture Letters, 2025
CAL

Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference Infrastructure

Rui Xie, Asad Ul Haq, Yunhua Fang, and 5 more authors

IEEE Computer Architecture Letters, 2025
MEMSYS

ZipCXL: CXL-based Main Memory Compression at Low Performance Penalty

Asad Ul Haq, Rui Xie, Linsen Ma, and 3 more authors

In Proceedings of the International Symposium on Memory Systems, 2025
ICS

BitWeaver: Read-Time Truncation in Memory

Garrett Gagnon, Srikanth Malla, Yangwook Kang, and 1 more author

In Proceedings of the ACM International Conference on Supercomputing, 2025
DATE

NORA: Noise-Optimized Rescaling of LLMs on Analog Compute-in-Memory Accelerators

Yayue Hou, Hsinyu Tsai, Kaoutar El Maghraoui, and 3 more authors

In Proceedings of Design, Automation and Test in Europe Conference, 2025

2024

NeurIPS-W

MAPLE: Memory-Aware Predict and Load for Efficient LLM Inference

Zhenyu Liu, Zhemin Zhang, Zirui Zhang, and 4 more authors

In Workshop on Machine Learning and Compression, NeurIPS, 2024
CAL

SmartQuant: CXL-Based AI Model Store in Support of Runtime Configurable Weight Quantization

Rui Xie, Asad Ul Haq, Linsen Ma, and 5 more authors

IEEE Computer Architecture Letters, 2024
Preprint

Enhance DNN Adversarial Robustness and Efficiency via Injecting Noise to Non-Essential Neurons

Zhenyu Liu, Garrett Gagnon, Swagath Venkataramani, and 1 more author

arXiv preprint arXiv:2402.04325, 2024

2023

ISCA

ECSSD: Hardware/Data Layout Co-Designed In-Storage-Computing Architecture for Extreme Classification

Siqi Li, Fengbin Tu, Liu Liu, and 5 more authors

In Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
MLSys

ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUs

Guyue Huang, Yang Bai, Liu Liu, and 4 more authors

Proceedings of Machine Learning and Systems, 2023
PPoPP

Dynamic N: M fine-grained structured sparse attention mechanism

Zhaodong Chen, Zheng Qu, Yuying Quan, and 3 more authors

In Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

2022

JSSC

TranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator With Pipeline/Parallel Reconfigurable Modes

Fengbin Tu, Zihan Wu, Yiqi Wang, and 7 more authors

IEEE Journal of Solid-State Circuits, 2022
TC

Dynamic sparse attention for scalable transformer acceleration

Liu Liu, Zheng Qu, Zhaodong Chen, and 3 more authors

IEEE Transactions on Computers, 2022
ISCA

INSPIRE: in-storage private information retrieval via protocol and architecture co-design

Jilan Lin, Ling Liang, Zheng Qu, and 6 more authors

In Proceedings of the 49th Annual International Symposium on Computer Architecture, 2022
ASPLOS

A one-for-all and o (v log (v))-cost solution for parallel merge style operations on sorted key-value arrays

Bangyan Wang, Lei Deng, Fei Sun, and 4 more authors

In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2022
ASPLOS

DOTA: detect and omit weak attentions for scalable transformer acceleration

Zheng Qu, Liu Liu, Fengbin Tu, and 3 more authors

In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2022
ISSCC

A 28nm 15.59 \muJ/token full-digital bitline-transpose CIM-based sparse transformer accelerator with pipeline/parallel reconfigurable modes

Fengbin Tu, Zihan Wu, Yiqi Wang, and 7 more authors

In 2022 IEEE International Solid-State Circuits Conference (ISSCC), 2022

2021

Computer

\pi-rt: A runtime framework to enable energy-efficient real-time robotic vision applications on heterogeneous architectures

Liu Liu, Jie Tang, Shaoshan Liu, and 3 more authors

Computer, 2021
SC

Efficient tensor core-based gpu kernels for structured sparsity under reduced precision

Zhaodong Chen, Zheng Qu, Liu Liu, and 2 more authors

In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021
MICRO

ENMC: Extreme near-memory classification via approximate screening

Liu Liu, Jilan Lin, Zheng Qu, and 2 more authors

In 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

2020

MICRO

DUET: Boosting deep neural network efficiency on dual-module architecture

Liu Liu, Zheng Qu, Lei Deng, and 6 more authors

In 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
DAC

Computation on sparse neural networks and its implications for future hardware

Fei Sun, Minghai Qin, Tianyun Zhang, and 3 more authors

In 2020 57th ACM/IEEE Design Automation Conference (DAC), 2020
ICML

Boosting deep neural network efficiency with dual-module inference

Liu Liu, Lei Deng, Zhaodong Chen, and 7 more authors

In International Conference on Machine Learning, 2020

2019

ICLR

Dynamic Sparse Graph for Efficient Deep Learning

Liu Liu, Lei Deng, Xing Hu, and 4 more authors

In International Conference on Learning Representations, 2019

2018

TCAD

SemiMap: A semi-folded convolution mapping for speed-overhead balance on crossbars

Lei Deng, Ling Liang, Guanrui Wang, and 7 more authors

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018
Fast object tracking on a many-core neural network chip

Lei Deng, Zhe Zou, Xin Ma, and 7 more authors

Frontiers in neuroscience, 2018
TNNLS

L1 -norm batch normalization for efficient training of deep neural networks

Shuang Wu, Guoqi Li, Lei Deng, and 4 more authors

IEEE transactions on neural networks and learning systems, 2018

2017

ASP-DAC

Building energy-efficient multi-level cell STT-RAM caches with data compression

Liu Liu, Ping Chi, Shuangchen Li, and 2 more authors

In 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), 2017

2016

ICCAD

Nvsim-cam: a circuit-level simulator for emerging nonvolatile memory based content-addressable memory

Shuangchen Li, Liu Liu, Peng Gu, and 2 more authors

In 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2016
GLS-VLSI

Leveraging 3D technologies for hardware security: Opportunities and challenges

Peng Gu, Shuangchen Li, Dylan Stow, and 4 more authors

In Proceedings of the 26th edition on Great Lakes Symposium on VLSI, 2016