Publications
2026
- STARC: Selective Token Access with Remapping and Clustering for Efficient LLM Decoding on PIM SystemsIn Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2026
2025
- SAGE: Saliency-Aware Grouping for Efficient Mapping of LLMs on Analog Compute-in-MemoryIn Proceedings of International Conference on Computer-Aided Design, 2025
- CALAccelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory SystemIEEE Computer Architecture Letters, 2025
- CALBreaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference InfrastructureIEEE Computer Architecture Letters, 2025
- MEMSYSZipCXL: CXL-based Main Memory Compression at Low Performance PenaltyIn Proceedings of the International Symposium on Memory Systems, 2025
- BitWeaver: Read-Time Truncation in MemoryIn Proceedings of the ACM International Conference on Supercomputing, 2025
- NORA: Noise-Optimized Rescaling of LLMs on Analog Compute-in-Memory AcceleratorsIn Proceedings of Design, Automation and Test in Europe Conference, 2025
2024
- NeurIPS-WMAPLE: Memory-Aware Predict and Load for Efficient LLM InferenceIn Workshop on Machine Learning and Compression, NeurIPS, 2024
- CALSmartQuant: CXL-Based AI Model Store in Support of Runtime Configurable Weight QuantizationIEEE Computer Architecture Letters, 2024
- PreprintEnhance DNN Adversarial Robustness and Efficiency via Injecting Noise to Non-Essential NeuronsarXiv preprint arXiv:2402.04325, 2024
2023
- ECSSD: Hardware/Data Layout Co-Designed In-Storage-Computing Architecture for Extreme ClassificationIn Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
- ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUsProceedings of Machine Learning and Systems, 2023
- Dynamic N: M fine-grained structured sparse attention mechanismIn Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023
2022
- JSSCTranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator With Pipeline/Parallel Reconfigurable ModesIEEE Journal of Solid-State Circuits, 2022
- TCDynamic sparse attention for scalable transformer accelerationIEEE Transactions on Computers, 2022
- INSPIRE: in-storage private information retrieval via protocol and architecture co-designIn Proceedings of the 49th Annual International Symposium on Computer Architecture, 2022
- A one-for-all and o (v log (v))-cost solution for parallel merge style operations on sorted key-value arraysIn Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2022
- DOTA: detect and omit weak attentions for scalable transformer accelerationIn Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2022
- A 28nm 15.59 \muJ/token full-digital bitline-transpose CIM-based sparse transformer accelerator with pipeline/parallel reconfigurable modesIn 2022 IEEE International Solid-State Circuits Conference (ISSCC), 2022
2021
- Computer\pi-rt: A runtime framework to enable energy-efficient real-time robotic vision applications on heterogeneous architecturesComputer, 2021
- Efficient tensor core-based gpu kernels for structured sparsity under reduced precisionIn Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021
- ENMC: Extreme near-memory classification via approximate screeningIn 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
2020
- DUET: Boosting deep neural network efficiency on dual-module architectureIn 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
- Computation on sparse neural networks and its implications for future hardwareIn 2020 57th ACM/IEEE Design Automation Conference (DAC), 2020
- Boosting deep neural network efficiency with dual-module inferenceIn International Conference on Machine Learning, 2020
2019
- Dynamic Sparse Graph for Efficient Deep LearningIn International Conference on Learning Representations, 2019
2018
- TCADSemiMap: A semi-folded convolution mapping for speed-overhead balance on crossbarsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018
- Fast object tracking on a many-core neural network chipFrontiers in neuroscience, 2018
- TNNLSL1 -norm batch normalization for efficient training of deep neural networksIEEE transactions on neural networks and learning systems, 2018
2017
- Building energy-efficient multi-level cell STT-RAM caches with data compressionIn 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), 2017
2016
- Nvsim-cam: a circuit-level simulator for emerging nonvolatile memory based content-addressable memoryIn 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2016
- Leveraging 3D technologies for hardware security: Opportunities and challengesIn Proceedings of the 26th edition on Great Lakes Symposium on VLSI, 2016