- Chair of Compiler Construction
- Chair of Emerging Electronic Technologies
- Chair of Knowledge-Based Systems
- Chair of Molecular Functional Materials
- Chair of Network Dynamics
- Chair of Organic Devices
- Chair of Processor Design
Hamid Farzaneh |
||
Phone Fax Visitor's Address |
+49 (0)351 463 43729 +49 (0)351 463 39995 Helmholtzstrasse 18,3rd floor, BAR III55 01069 Dresden |
Hamid Farzaneh received his bachelor's degree in Computer Engineering from Shiraz University in August 2019, and his master's degree in Computer Systems and Architecture from Shahid Beheshti University in November 2021.
In August 2022, he joined the chair as a research assistant. He works on high-level compiler frameworks (like MLIR) and optimization for data and computation mapping onto highly heterogeneous systems with mainstream CPUs, FPGAs, SRAM, DRAM, and emerging NVMs and accelerators.
The volume of data processing in these applications has skyrocketed in recent years and demands significantly higher off-chip memory bandwidth. However, increasing the off-chip bandwidth is becoming increasingly expensive and is strictly constrained by the chip package and system models. To overcome the memory wall and capacity and power walls, computer architects are moving to non-Von-Neumann system models like near-memory and in-memory computing. However, The programmability aspect of these systems has received relatively less attention. Using the power of compilers, I tackle issues in high performance, energy efficiency, and hardware/software cooperation of these systems.
In that regard, my current main topics are:
- Working on high-level compiler frameworks (like MLIR) and optimizing data and computation mapping for data and computation mapping onto heterogeneous systems
- Developing models for managing workloads in heterogeneous systems
Possible student topics include:
- Front-ends for MLIR Computing-in-Memory(CIM) Compiler
End-to-end compilation flows for CIM-capable systems exist, but interfaces to high-level languages are missing (limited). The goal of this project is to design and implement front-ends to enable lowering high-level languages/descriptions to the CIM compilers.
- Heterogeneous Systems: Mapping and Optimizations
Also, if you have a related topic in mind, please feel free to reach out.
2025
- Asif Ali Khan, Hamid Farzaneh, Karl F. A. Friebel, Clément Fournier, Lorenzo Chelini, Jeronimo Castrillon, "CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms" (to appear), Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'25), Association for Computing Machinery, Mar 2025. [Bibtex & Downloads]
CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms
Reference
Asif Ali Khan, Hamid Farzaneh, Karl F. A. Friebel, Clément Fournier, Lorenzo Chelini, Jeronimo Castrillon, "CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms" (to appear), Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'25), Association for Computing Machinery, Mar 2025.
Bibtex
@InProceedings{khan_asplos25,
author = {Khan, Asif Ali and Farzaneh, Hamid and Friebel, Karl F. A. and Fournier, Clément and Chelini, Lorenzo and Castrillon, Jeronimo},
booktitle = {Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'25)},
title = {CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms},
location = {Rotterdam, The Netherlands},
publisher = {Association for Computing Machinery},
series = {ASPLOS '25},
month = mar,
year = {2025},
}Downloads
No Downloads available for this publication
Permalink
2024
- Hamid Farzaneh, João Paulo Cardoso De Lima, Ali Nezhadi Khelejani, Asif Ali Khan, Mahta Mayahinia, Mehdi Tahoori, Jeronimo Castrillon, "SHERLOCK: Scheduling Efficient and Reliable Bulk Bitwise Operations in NVMs", Proceedings of the 61th ACM/IEEE Design Automation Conference (DAC'24), Association for Computing Machinery, New York, NY, USA, Jun 2024. [doi] [Bibtex & Downloads]
SHERLOCK: Scheduling Efficient and Reliable Bulk Bitwise Operations in NVMs
Reference
Hamid Farzaneh, João Paulo Cardoso De Lima, Ali Nezhadi Khelejani, Asif Ali Khan, Mahta Mayahinia, Mehdi Tahoori, Jeronimo Castrillon, "SHERLOCK: Scheduling Efficient and Reliable Bulk Bitwise Operations in NVMs", Proceedings of the 61th ACM/IEEE Design Automation Conference (DAC'24), Association for Computing Machinery, New York, NY, USA, Jun 2024. [doi]
Abstract
Bulk bitwise operations are commonplace in application domains such as databases, web search, cryptography, and image processing. The ever-growing volume of data and processing demands of these domains often result in high energy consumption and latency in conventional system architectures, mainly due to data movement between the processing and memory subsystems. Non-volatile memories (NVMs), such as RRAM, PCM and STT-MRAM, facilitate conducting bulk-bitwise logic operations in-memory (CIM). Efficient mapping of complex applications to these CIM-capable NVMs is non-trivial and can even lead to slowdowns. This paper presents Sherlock, a novel mapping and scheduling method for efficient execution of bulk bitwise operations in NVMs. Sherlock collaboratively optimizes for performance and energy consumption and outperforms the state-of-the-art by 10\texttimes and 4.6\texttimes, respectively.
Bibtex
@InProceedings{farzaneh_dac24,
author = {Hamid Farzaneh and Jo{\~a}o Paulo Cardoso De Lima and Ali Nezhadi Khelejani and Asif Ali Khan and Mahta Mayahinia and Mehdi Tahoori and Jeronimo Castrillon},
booktitle = {Proceedings of the 61th ACM/IEEE Design Automation Conference (DAC'24)},
title = {{SHERLOCK}: Scheduling Efficient and Reliable Bulk Bitwise Operations in {NVMs}},
location = {San Francisco, California},
series = {DAC '24},
month = jun,
year = {2024},
isbn = {9798400706011},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3649329.3658485},
doi = {10.1145/3649329.3658485},
abstract = {Bulk bitwise operations are commonplace in application domains such as databases, web search, cryptography, and image processing. The ever-growing volume of data and processing demands of these domains often result in high energy consumption and latency in conventional system architectures, mainly due to data movement between the processing and memory subsystems. Non-volatile memories (NVMs), such as RRAM, PCM and STT-MRAM, facilitate conducting bulk-bitwise logic operations in-memory (CIM). Efficient mapping of complex applications to these CIM-capable NVMs is non-trivial and can even lead to slowdowns. This paper presents Sherlock, a novel mapping and scheduling method for efficient execution of bulk bitwise operations in NVMs. Sherlock collaboratively optimizes for performance and energy consumption and outperforms the state-of-the-art by 10\texttimes{} and 4.6\texttimes{}, respectively.},
articleno = {293},
numpages = {6},
}Downloads
2406_Farzaneh_DAC [PDF]
Permalink
- Hamid Farzaneh, João Paulo Cardoso de Lima, Mengyuan Li, Asif Ali Khan, Xiaobo Sharon Hu, Jeronimo Castrillon, "C4CAM: A Compiler for CAM-based In-memory Accelerators", Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'24), Volume 3, Association for Computing Machinery, pp. 164–177, New York, NY, USA, May 2024. [doi] [Bibtex & Downloads]
C4CAM: A Compiler for CAM-based In-memory Accelerators
Reference
Hamid Farzaneh, João Paulo Cardoso de Lima, Mengyuan Li, Asif Ali Khan, Xiaobo Sharon Hu, Jeronimo Castrillon, "C4CAM: A Compiler for CAM-based In-memory Accelerators", Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'24), Volume 3, Association for Computing Machinery, pp. 164–177, New York, NY, USA, May 2024. [doi]
Abstract
Machine learning and data analytics applications increasingly suffer from the high latency and energy consumption of conventional von Neumann architectures. Recently, several in-memory and near-memory systems have been proposed to overcome this von Neumann bottleneck. Platforms based on content-addressable memories (CAMs) are particularly interesting due to their efficient support for the search-based operations that form the foundation for many applications, including K-nearest neighbors (KNN), high-dimensional computing (HDC), recommender systems, and one-shot learning among others. Today, these platforms are designed by hand and can only be programmed with low-level code, accessible only to hardware experts. In this paper, we introduce C4CAM, the first compiler framework to quickly explore CAM configurations and seamlessly generate code from high-level Torch-Script code. C4CAM employs a hierarchy of abstractions that progressively lowers programs, allowing code transformations at the most suitable abstraction level. Depending on the type and technology, CAM arrays exhibit varying latencies and power profiles. Our framework allows analyzing the impact of such differences in terms of system-level performance and energy consumption, and thus supports designers in selecting appropriate designs for a given application.
Bibtex
@InProceedings{farzaneh_asplos24,
author = {Hamid Farzaneh and João Paulo Cardoso de Lima and Mengyuan Li and Asif Ali Khan and Xiaobo Sharon Hu and Jeronimo Castrillon},
booktitle = {Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'24), Volume 3},
title = {C4CAM: A Compiler for CAM-based In-memory Accelerators},
doi = {10.1145/3620666.3651386},
isbn = {9798400703867},
location = {La Jolla, CA, USA},
pages = {164--177},
publisher = {Association for Computing Machinery},
series = {ASPLOS '24},
url = {https://arxiv.org/abs/2309.06418},
abstract = {Machine learning and data analytics applications increasingly suffer from the high latency and energy consumption of conventional von Neumann architectures. Recently, several in-memory and near-memory systems have been proposed to overcome this von Neumann bottleneck. Platforms based on content-addressable memories (CAMs) are particularly interesting due to their efficient support for the search-based operations that form the foundation for many applications, including K-nearest neighbors (KNN), high-dimensional computing (HDC), recommender systems, and one-shot learning among others. Today, these platforms are designed by hand and can only be programmed with low-level code, accessible only to hardware experts. In this paper, we introduce C4CAM, the first compiler framework to quickly explore CAM configurations and seamlessly generate code from high-level Torch-Script code. C4CAM employs a hierarchy of abstractions that progressively lowers programs, allowing code transformations at the most suitable abstraction level. Depending on the type and technology, CAM arrays exhibit varying latencies and power profiles. Our framework allows analyzing the impact of such differences in terms of system-level performance and energy consumption, and thus supports designers in selecting appropriate designs for a given application.},
address = {New York, NY, USA},
month = may,
numpages = {14},
year = {2024},
}Downloads
2405_Farzaneh_ASPLOS [PDF]
Permalink
- Michael Niemier, Zephan Enciso, Mohammad Mehdi Sharifi, X. Sharon Hu, Ian O'Connor, Alexander Graening, Ravit Sharma, Puneet Gupta, Jeronimo Castrillon, João Paulo C. de Lima, Asif Ali Khan, Hamid Farzaneh, Nashrah Afroze, Asif Islam Khan, Julien Ryckaert, "Smoothing Disruption Across the Stack: Tales of Memory, Heterogeneity, and Compilers", Proceedings of the 2024 Design, Automation and Test in Europe Conference (DATE), IEEE, pp. 1–10, Mar 2024. [Bibtex & Downloads]
Smoothing Disruption Across the Stack: Tales of Memory, Heterogeneity, and Compilers
Reference
Michael Niemier, Zephan Enciso, Mohammad Mehdi Sharifi, X. Sharon Hu, Ian O'Connor, Alexander Graening, Ravit Sharma, Puneet Gupta, Jeronimo Castrillon, João Paulo C. de Lima, Asif Ali Khan, Hamid Farzaneh, Nashrah Afroze, Asif Islam Khan, Julien Ryckaert, "Smoothing Disruption Across the Stack: Tales of Memory, Heterogeneity, and Compilers", Proceedings of the 2024 Design, Automation and Test in Europe Conference (DATE), IEEE, pp. 1–10, Mar 2024.
Bibtex
@InProceedings{niemier_date24,
author = {Michael Niemier and Zephan Enciso and Mohammad Mehdi Sharifi and X. Sharon Hu and Ian O'Connor and Alexander Graening and Ravit Sharma and Puneet Gupta and Jeronimo Castrillon and João Paulo C. de Lima and Asif Ali Khan and Hamid Farzaneh and Nashrah Afroze and Asif Islam Khan and Julien Ryckaert},
booktitle = {Proceedings of the 2024 Design, Automation and Test in Europe Conference (DATE)},
title = {Smoothing Disruption Across the Stack: Tales of Memory, Heterogeneity, and Compilers},
location = {Valencia, Spain},
url = {https://ieeexplore.ieee.org/document/10546772},
pages = {1--10},
publisher = {IEEE},
series = {DATE'24},
month = mar,
year = {2024},
}Downloads
2403_Niemier_DATE [PDF]
Permalink
- Asif Ali Khan, João Paulo C. De Lima, Hamid Farzaneh, Jeronimo Castrillon, "The Landscape of Compute-near-memory and Compute-in-memory: A Research and Commercial Overview", Jan 2024. [Bibtex & Downloads]
The Landscape of Compute-near-memory and Compute-in-memory: A Research and Commercial Overview
Reference
Asif Ali Khan, João Paulo C. De Lima, Hamid Farzaneh, Jeronimo Castrillon, "The Landscape of Compute-near-memory and Compute-in-memory: A Research and Commercial Overview", Jan 2024.
Bibtex
@Report{khan_cimlandscape_2024,
author = {Asif Ali Khan and João Paulo C. De Lima and Hamid Farzaneh and Jeronimo Castrillon},
title = {The Landscape of Compute-near-memory and Compute-in-memory: A Research and Commercial Overview},
eprint = {2401.14428},
url = {https://arxiv.org/abs/2401.14428},
archiveprefix = {arXiv},
month = jan,
primaryclass = {cs.AR},
year = {2024},
}Downloads
No Downloads available for this publication
Permalink
2023
- Jörg Henkel, Lokesh Siddhu, Lars Bauer, Jürgen Teich, Stefan Wildermann, Mehdi Tahoori, Mahta Mayahinia, Jeronimo Castrillon, Asif Ali Khan, Hamid Farzaneh, João Paulo C. de Lima, Jian-Jia Chen, Christian Hakert, Kuan-Hsun Chen, Chia-Lin Yang, Hsiang-Yun Cheng, "Special Session – Non-Volatile Memories: Challenges and Opportunities for Embedded System Architectures with Focus on Machine Learning Applications", Proceedings of the 2023 International Conference on Compilers, Architecture, and Synthesis of Embedded Systems (CASES), pp. 11–20, Sep 2023. [doi] [Bibtex & Downloads]
Special Session – Non-Volatile Memories: Challenges and Opportunities for Embedded System Architectures with Focus on Machine Learning Applications
Reference
Jörg Henkel, Lokesh Siddhu, Lars Bauer, Jürgen Teich, Stefan Wildermann, Mehdi Tahoori, Mahta Mayahinia, Jeronimo Castrillon, Asif Ali Khan, Hamid Farzaneh, João Paulo C. de Lima, Jian-Jia Chen, Christian Hakert, Kuan-Hsun Chen, Chia-Lin Yang, Hsiang-Yun Cheng, "Special Session – Non-Volatile Memories: Challenges and Opportunities for Embedded System Architectures with Focus on Machine Learning Applications", Proceedings of the 2023 International Conference on Compilers, Architecture, and Synthesis of Embedded Systems (CASES), pp. 11–20, Sep 2023. [doi]
Abstract
This paper explores the challenges and opportunities of integrating non-volatile memories (NVMs) into embedded systems for machine learning. NVMs offer advantages such as increased memory density, lower power consumption, non-volatility, and compute-in- memory capabilities. The paper focuses on integrating NVMs into embedded systems, particularly in intermittent computing, where systems operate during periods of available energy. NVM technologies bring persistence closer to the CPU core, enabling efficient designs for energy-constrained scenarios. Next, computation in resistive NVMs is explored, highlighting its potential for accelerating machine learning algorithms. However, challenges related to reliability and device non-idealities need to be addressed. The paper also discusses memory-centric machine learning, leveraging NVMs to overcome the memory wall challenge. By optimizing memory layouts and utilizing probabilistic decision tree execution and neural network sparsity, NVM-based systems can improve cache behavior and reduce unnecessary computations. In conclusion, the paper emphasizes the need for further research and optimization for the widespread adoption of NVMs in embedded systems presenting relevant challenges, especially for machine learning applications.
Bibtex
@InProceedings{henkel_cases23,
author = {J\"{o}rg Henkel and Lokesh Siddhu and Lars Bauer and J\"{u}rgen Teich and Stefan Wildermann and Mehdi Tahoori and Mahta Mayahinia and Jeronimo Castrillon and Asif Ali Khan and Hamid Farzaneh and Jo\~{a}o Paulo C. de Lima and Jian-Jia Chen and Christian Hakert and Kuan-Hsun Chen and Chia-Lin Yang and Hsiang-Yun Cheng},
booktitle = {Proceedings of the 2023 International Conference on Compilers, Architecture, and Synthesis of Embedded Systems (CASES)},
title = {Special Session -- Non-Volatile Memories: Challenges and Opportunities for Embedded System Architectures with Focus on Machine Learning Applications},
location = {Hamburg, Germany},
abstract = {This paper explores the challenges and opportunities of integrating non-volatile memories (NVMs) into embedded systems for machine learning. NVMs offer advantages such as increased memory density, lower power consumption, non-volatility, and compute-in- memory capabilities. The paper focuses on integrating NVMs into embedded systems, particularly in intermittent computing, where systems operate during periods of available energy. NVM technologies bring persistence closer to the CPU core, enabling efficient designs for energy-constrained scenarios. Next, computation in resistive NVMs is explored, highlighting its potential for accelerating machine learning algorithms. However, challenges related to reliability and device non-idealities need to be addressed. The paper also discusses memory-centric machine learning, leveraging NVMs to overcome the memory wall challenge. By optimizing memory layouts and utilizing probabilistic decision tree execution and neural network sparsity, NVM-based systems can improve cache behavior and reduce unnecessary computations. In conclusion, the paper emphasizes the need for further research and optimization for the widespread adoption of NVMs in embedded systems presenting relevant challenges, especially for machine learning applications.},
pages = {11--20},
url = {https://ieeexplore.ieee.org/abstract/document/10316216},
doi = {10.1145/3607889.3609088},
isbn = {9798400702907},
series = {CASES '23 Companion},
issn = {2643-1726},
month = sep,
numpages = {10},
year = {2023},
}Downloads
2309_Henkel_CASES [PDF]
Permalink
- João Paulo C. de Lima, Asif Ali Khan, Hamid Farzaneh, Jeronimo Castrillon, "Efficient Associative Processing with RTM-TCAMs", In Proceeding: 1st in-Memory Architectures and Computing Applications Workshop (iMACAW), co-located with the 60th Design Automation Conference (DAC'23), 2pp, Jul 2023. [Bibtex & Downloads]
Efficient Associative Processing with RTM-TCAMs
Reference
João Paulo C. de Lima, Asif Ali Khan, Hamid Farzaneh, Jeronimo Castrillon, "Efficient Associative Processing with RTM-TCAMs", In Proceeding: 1st in-Memory Architectures and Computing Applications Workshop (iMACAW), co-located with the 60th Design Automation Conference (DAC'23), 2pp, Jul 2023.
Bibtex
@InProceedings{lima_imacaw23,
author = {Jo{\~a}o Paulo C. de Lima and Asif Ali Khan and Hamid Farzaneh and Jeronimo Castrillon},
booktitle = {1st in-Memory Architectures and Computing Applications Workshop (iMACAW), co-located with the 60th Design Automation Conference (DAC'23)},
title = {Efficient Associative Processing with RTM-TCAMs},
location = {San Francisco, CA, USA},
pages = {2pp},
month = jul,
year = {2023},
}Downloads
2307_deLima_iMACAW [PDF]
Permalink