When AI Meets Cache: Intelligent Memory Hierarchy Management in Modern CPUs

Yihan Wang; Suchuan Xing

doi:10.54097/yd6k8n48

Authors

Yihan Wang
Suchuan Xing

DOI:

https://doi.org/10.54097/yd6k8n48

Keywords:

Cache replacement policy, Memory hierarchy, Machine learning, Hardware prefetching, Reinforcement learning, Last-level cache, Deep learning, Neural network, Microarchitecture, Prefetcher

Abstract

The persistent and widening gap between processor execution speed and memory access latency—commonly termed the "memory wall"—has made efficient cache management one of the most consequential challenges in modern computer architecture. Traditional heuristic-based strategies such as least recently used (LRU), re-reference interval prediction (RRIP), and signature-based hit prediction (SHiP) have long served as foundations of cache control, yet they fundamentally fail to capture complex, application-specific, or workload-diverse memory access patterns. The rapid maturation of artificial intelligence (AI) and machine learning (ML) techniques now offers transformative opportunities to reimagine every tier of the memory hierarchy—from L1 instruction caches to last-level cache (LLC) partitioning and speculative hardware prefetching. This review surveys the state of the art at the intersection of AI and cache management, examining techniques including deep reinforcement learning (RL), long short-term memory (LSTM) networks, transformer-based sequence models, and graph neural networks (GNN) as applied to the three principal problem domains of cache replacement, hardware prefetching, and adaptive resource partitioning. We critically analyze trade-offs between prediction accuracy, inference latency, hardware overhead, and practical deployability, synthesizing empirical findings from recent simulation studies and silicon implementations. Key open challenges are identified, including online learning constraints, workload generalization, and the integration of AI inference within microarchitectural timing budgets.

Downloads

Download data is not yet available.

References

[1] Serrano, M., & Feeley, M. (2019, February). Property caches revisited. In Proceedings of the 28th International Conference on Compiler Construction (pp. 99-110).

[2] Harris, S. L., & Harris, D. (2021, June). Digital design and RISC-V computer architecture textbook. In 2021 ACM/IEEE Workshop on Computer Architecture Education (WCAE) (pp. 1-5). IEEE.

[3] Sankar, T., Venkata Ramana, R. B., & Balamuralikrishnan, A. (2023). AI-Optimized Hyperscale Data Centers: Meeting the Rising Demands of Generative AI Workloads. International Journal of Trend in Scientific Research and Development, 7(1), 1504-1514.

[4] Sieber, C., Schwarzmann, S., Blenk, A., Zinner, T., & Kellerer, W. (2020). Scalable application-and user-aware resource allocation in enterprise networks using end-host pacing. ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS), 5(3), 1-41.

[5] Zaman, K. S., Reaz, M. B. I., Ali, S. H. M., Bakar, A. A. A., & Chowdhury, M. E. H. (2021). Custom hardware architectures for deep learning on portable devices: A review. IEEE Transactions on Neural Networks and Learning Systems, 33(11), 6068-6088.

[6] Hameed, F., Khan, A. A., & Castrillon, J. (2020). Improving the performance of block-based DRAM caches via tag-data decoupling. IEEE Transactions on Computers, 70(11), 1914-1927.

[7] Zhao, W., Chen, T., Yang, J. S., & Qiu, L. (2026). AutoML-Pipeline: A RAG-Enhanced Code Generation Framework with Pre-validation for Cloud-Native Machine Learning Workflows. IEEE Access.

[8] Zyarah, A. M., & Kudithipudi, D. (2019). Neuromorphic architecture for the hierarchical temporal memory. IEEE Transactions on Emerging Topics in Computational Intelligence, 3(1), 4-14.

[9] Xu, C., Mao, W., Zhang, W., & Chen, S. (2022). Remember intentions: Retrospective-memory-based trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6488-6497).

[10] Maheshwari, O. (2025). Adaptive-LRU: A Lightweight, Thrash-Resistant Cache Replacement Policy for High-Performance CPU Caches. Authorea Preprints.

[11] Neglia, G., Garetto, M., & Leonardi, E. (2021). Similarity caching: Theory and algorithms. IEEE/ACM Transactions on Networking, 30(2), 475-486.

[12] Lamsub, T., & Tandayya, P. (2019, September). A dynamic popularity caching policy for dynamic adaptive streaming over HTTP. In 2019 19th International Symposium on Communications and Information Technologies (ISCIT) (pp. 322-327). IEEE.

[13] Kanellopoulos, K., Nam, H. C., Bostanci, N., Bera, R., Sadrosadati, M., Kumar, R., ... & Mutlu, O. (2023, October). Victima: Drastically increasing address translation reach by leveraging underutilized cache resources. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 1178-1195).

[14] Zhang, Y., Zhou, K., Huang, P., Wang, H., Hu, J., Wang, Y., ... & Cheng, B. (2020, March). A machine learning based write policy for SSD cache in cloud block storage. In 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) (pp. 1279-1282). IEEE.

[15] Nalau, J., Torabi, E., Edwards, N., Howes, M., & Morgan, E. (2021). A critical exploration of adaptation heuristics. Climate Risk Management, 32, 100292.

[16] Rodríguez-Lorenzo, A., & Tzou, C. H. J. (2021). Principles of facial nerve reconstruction. In Facial Palsy: Techniques for Reanimation of the Paralyzed Face (pp. 55-69). Cham: Springer International Publishing.

[17] Bender, M. A., Das, R., Farach-Colton, M., & Tagliavini, G. (2023, June). An associativity threshold phenomenon in set-associative caches. In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures (pp. 117-127).

[18] Huang, L., Huang, J., Chen, P., Li, H., & Cui, J. (2023). Long-term sequence dependency capture for spatiotemporal graph modeling. Knowledge-Based Systems, 278, 110818.

[19] Bakhshalipour, M., Shakerinava, M., Lotfi-Kamran, P., & Sarbazi-Azad, H. (2019, February). Bingo spatial data prefetcher. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) (pp. 399-411). IEEE.

[20] Goswami, M. (2024). AI-based anomaly detection for real-time cybersecurity. International journal of research and review techniques, 3(1), 45-53.

[21] Fu, Z., Liu, Q., Fu, Z., & Wang, Y. (2021). Stmtrack: Template-free visual tracking with space-time memory networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13774-13783).

[22] Shen, Y. (2024). Resource-aware scheduling for 2D/3D multi-/many-core processor-memory systems. Yixian Shen.

[23] Powell, M. D., Fleming, P., Krishna, V. I., Lakkakula, N., Ravisundar, S., Mosur, P., ... & Kumar, S. (2025). Intel Xeon 6 Product Family. IEEE Micro.

[24] Yang, J. S., Zeng, Z., & Shen, Z. (2025). Neural-Symbolic Dual-Indexing Architectures for Scalable Retrieval-Augmented Generation. IEEE Access, 13, 210507-210519.

[25] Abbasi, M., Váz, P., Silva, J., Cardoso, F., Sá, F., & Martins, P. (2026). Machine Learning-Enhanced Database Cache Management: A Comprehensive Performance Analysis and Comparison of Predictive Replacement Policies. Applied Sciences, 16(2), 666.

[26] Mostofi, S., Gupta, S., Hassani, A., Tibrewala, K., Teran, E., Gratz, P. V., & Jiménez, D. A. (2025, June). Light-weight Cache Replacement for Instruction Heavy Workloads. In Proceedings of the 52nd Annual International Symposium on Computer Architecture (pp. 1005-1019).

[27] Shi, Z., Jain, A., Swersky, K., Hashemi, M., Ranganathan, P., & Lin, C. (2021, April). A hierarchical neural model of data prefetching. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 861-873).

[28] Wojna, Z., Ferrari, V., Guadarrama, S., Silberman, N., Chen, L. C., Fathi, A., & Uijlings, J. (2019). The devil is in the decoder: Classification, regression and gans. International Journal of Computer Vision, 127(11), 1694-1706.

[29] Yang, X., & Thomos, N. (2021). An approximate dynamic programming approach for collaborative caching. Engineering Optimization, 53(6), 1005-1023.

[30] Shawky, A., El-Kharashi, M. W., Safar, M., & Dessouky, M. (2022, September). Self-optimizing memory controllers: Proposing request-level scheduling. In 2022 Second International Conference on Computer Science, Engineering and Applications (ICCSEA) (pp. 1-5). IEEE.

[31] Cao, Y., Zhao, H., Cheng, Y., Shu, T., Chen, Y., Liu, G., ... & Li, Y. (2024). Survey on large language model-enhanced reinforcement learning: Concept, taxonomy, and methods. IEEE Transactions on Neural Networks and Learning Systems, 36(6), 9737-9757.

[32] Liu, S., Du, H., & Wang, S. (2025). Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement. arXiv preprint arXiv:2512.14151.

[33] Li, X., Zhu, L., Zhang, C., Yang, H., Wang, H., & Zhang, J. (2021). Failure prediction for temporal dependency of hard drives. In Proceedings of the 11th International Workshop on Computer Science and Engineering (pp. 1-10).

[34] Guan, T., Liu, P., Zeng, X., Kim, M., & Seok, M. (2019). Recursive binary neural network training model for efficient usage of on-chip memory. IEEE Transactions on Circuits and Systems I: Regular Papers, 66(7), 2593-2605.

[35] Qiu, Z., Yang, J., Zhang, J., Li, C., Ma, X., Chen, Q., ... & Xu, Y. (2023, May). Frozenhot cache: Rethinking cache management for modern hardware. In Proceedings of the Eighteenth European Conference on Computer Systems (pp. 557-573).

[36] Shi, Z., Huang, X., Jain, A., & Lin, C. (2019, October). Applying deep learning to the cache replacement problem. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (pp. 413-425).

[37] Liu, Q., Wu, X., Liu, X., Zhang, Y., & Hu, Y. (2022). Near-data prediction based speculative optimization in a distribution environment. Mobile Networks and Applications, 27(6), 2339-2347.

[38] Ayers, G., Litz, H., Kozyrakis, C., & Ranganathan, P. (2020, March). Classifying memory access patterns for prefetching. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 513-526).

[39] Shi, Z., Jain, A., Swersky, K., Hashemi, M., Ranganathan, P., & Lin, C. (2021, April). A hierarchical neural model of data prefetching. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 861-873).

[40] Zhang, P., Srivastava, A., Brooks, B., Kannan, R., & Prasanna, V. K. (2020, September). Raop: Recurrent neural network augmented offset prefetcher. In Proceedings of the International Symposium on Memory Systems (pp. 352-362).

[41] Zhang, P., Srivastava, A., Nori, A. V., Kannan, R., & Prasanna, V. K. (2022, May). Fine-grained address segmentation for attention-based variable-degree prefetching. In Proceedings of the 19th ACM International Conference on Computing Frontiers (pp. 103-112).

[42] Liu, S., Guo, B., Fang, C., Wang, Z., Luo, S., Zhou, Z., & Yu, Z. (2023). Enabling resource-efficient AIoT system with cross-level optimization: A survey. IEEE Communications Surveys & Tutorials, 26(1), 389-427.

[43] Cavus, M., Sendag, R., & Yi, J. J. (2020). Informed prefetching for indirect memory accesses. ACM Transactions on Architecture and Code Optimization (TACO), 17(1), 1-29.

[44] Zhang, Y., Zhao, X., Li, Z., Cheng, G., Yin, J., Zhang, L., & Chen, Z. (2024). Integrating Artificial Intelligence into Operating Systems: A Survey on Techniques, Applications, and Future Directions. arXiv preprint arXiv:2407.14567.

[45] Li, Y. (2022). Reinforcement learning in practice: Opportunities and challenges. arXiv preprint arXiv:2202.11296.

[46] Yang, H., Fang, J., Hou, Y., Su, X., & Xiong, N. N. (2025). Reinforcement learning-driven adaptive prefetch aggressiveness control for enhanced performance in parallel system architectures. IEEE Transactions on Parallel and Distributed Systems.

[47] Iyer, R., De, V., Illikkal, R., Koufaty, D., Chitlur, B., Herdrich, A., ... & Karl, E. (2021). Advances in microprocessor cache architectures over the last 25 years. IEEE Micro, 41(6), 78-88.

[48] Lima, J. P. C. D. (2025). Architecture optimization and design tools for CAM-based accelerators: from general-purpose to machine learning.

[49] Rodolfo, T. A., Aguilera, C. J. G., Kastensmidt, F. L., Azambuja, J. R., & Beck Filho, A. C. (2025, April). Advances in AI Hardware Design in the Continuum: A Survey on Frameworks, Optimization, and Integration in Heterogeneous Systems. In 2025 IEEE Latin Conference on IoT (LCIoT) (pp. 290-293). IEEE.

[50] Li, W., Wang, J., Zhang, G., Li, L., Dang, Z., & Li, S. (2019). A reinforcement learning based smart cache strategy for cache-aided ultra-dense network. IEEE Access, 7, 39390-39401.

[51] Martınez, J. F., Ipek, E., Mutlu, O., Caruana, R., & Redmond, W. A. RETROSPECTIVE: Self-optimizing Memory Controllers: A Reinforcement Learning Approach.

[52] Wei, Z., Zhao, Y., Lyu, Z., Yuan, X., Zhang, Y., & Feng, L. (2024). Cooperative caching algorithm for mobile edge networks based on multi-agent meta reinforcement learning. Computer Networks, 242, 110247.

[53] Miao, X., Shi, Y., Yang, Z., Cui, B., & Jia, Z. (2023). Sdpipe: A semi-decentralized framework for heterogeneity-aware pipeline-parallel training. Proceedings of the VLDB Endowment, 16(9), 2354-2363.

[54] Chen, Y., Chen, T., Ebner, S., White, A. S., & Van Durme, B. (2020, November). Reading the manual: Event extraction as definition comprehension. In Proceedings of the Fourth Workshop on Structured Prediction for NLP (pp. 74-83).

[55] Garcia-Garcia, A., Saez, J. C., Risco-Martin, J. L., & Prieto-Matias, M. (2020). PBBCache: An open-source parallel simulator for rapid prototyping and evaluation of cache-partitioning and cache-clustering policies. Journal of Computational Science, 42, 101102.

[56] Qiu, H., Mao, W., Patke, A., Wang, C., Franke, H., Kalbarczyk, Z. T., ... & Iyer, R. K. (2022, April). Reinforcement learning for resource management in multi-tenant serverless platforms. In Proceedings of the 2nd European Workshop on Machine Learning and Systems (pp. 20-28).

[57] Ahn, J., Son, Y., Kim, D., & Park, S. (2026). Dynamic Micro-Batch and Token-Budget Scheduling for IoT-Scale Pipeline-Parallel LLM Inference. Sensors, 26(4), 1101.

[58] Srikanth, S., Jain, A., Conte, T. M., Debenedictis, E. P., & Cook, J. (2021). SortCache: Intelligent cache management for accelerating sparse data workloads. ACM Transactions on Architecture and Code Optimization (TACO), 18(4), 1-24.

[59] Chowdhury, T. K., & Ashfaq, S. (2024). High-Performance Computing Architectures To Strengthen Cloud Infrastructure Security. American Journal of Interdisciplinary Studies, 5(03), 01-42.

[60] Cheekuri, K. C. (2025). Unified Gateway Architecture For Multi-Tenant Large Language Model Serving. Journal of International Crisis & Risk Communication Research (JICRCR), 8.

When AI Meets Cache: Intelligent Memory Hierarchy Management in Modern CPUs

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Cover

Indexing & Abstracting