publications
2023
- ECSSD: Hardware/Data Layout Co-Designed In-Storage-Computing Architecture for Extreme ClassificationIn Proceedings of the 50th Annual International Symposium on Computer Architecture , 2023
- ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUsProceedings of Machine Learning and Systems, 2023
- Dynamic N: M fine-grained structured sparse attention mechanismIn Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming , 2023
2022
- JSSCTranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator With Pipeline/Parallel Reconfigurable ModesIEEE Journal of Solid-State Circuits, 2022
- TCDynamic sparse attention for scalable transformer accelerationIEEE Transactions on Computers, 2022
- INSPIRE: in-storage private information retrieval via protocol and architecture co-designIn Proceedings of the 49th Annual International Symposium on Computer Architecture , 2022
- A one-for-all and o (v log (v))-cost solution for parallel merge style operations on sorted key-value arraysIn Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems , 2022
- Dota: detect and omit weak attentions for scalable transformer accelerationIn Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems , 2022
- A 28nm 15.59 \muJ/token full-digital bitline-transpose CIM-based sparse transformer accelerator with pipeline/parallel reconfigurable modesIn 2022 IEEE International Solid-State Circuits Conference (ISSCC) , 2022
2021
- Computer\pi-rt: A runtime framework to enable energy-efficient real-time robotic vision applications on heterogeneous architecturesComputer, 2021
- Efficient tensor core-based gpu kernels for structured sparsity under reduced precisionIn Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , 2021
- Enmc: Extreme near-memory classification via approximate screeningIn 54th Annual IEEE/ACM International Symposium on Microarchitecture , 2021
2020
- DUET: Boosting deep neural network efficiency on dual-module architectureIn 53rd Annual IEEE/ACM International Symposium on Microarchitecture , 2020
- Computation on sparse neural networks and its implications for future hardwareIn 2020 57th ACM/IEEE Design Automation Conference (DAC) , 2020
- Boosting deep neural network efficiency with dual-module inferenceIn International Conference on Machine Learning , 2020
2019
- Dynamic Sparse Graph for Efficient Deep LearningIn International Conference on Learning Representations , 2019
2018
- TCADSemiMap: A semi-folded convolution mapping for speed-overhead balance on crossbarsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018
- Fast object tracking on a many-core neural network chipFrontiers in neuroscience, 2018
- TNNLSL1 -norm batch normalization for efficient training of deep neural networksIEEE transactions on neural networks and learning systems, 2018
2017
- Building energy-efficient multi-level cell STT-RAM caches with data compressionIn 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC) , 2017
2016
- Nvsim-cam: a circuit-level simulator for emerging nonvolatile memory based content-addressable memoryIn 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) , 2016
- Leveraging 3D technologies for hardware security: Opportunities and challengesIn Proceedings of the 26th edition on Great Lakes Symposium on VLSI , 2016