| Title: | Category Theory for Mechanistic Interpretability of Neural Networks |
| Subject: | Computer science |
| Level: | Advanced |
| Description: |
Deep neural networks specially transformers are the backbone of current large language models (LLMs) and computer vision models. However the decision-making processes of these models remain largely opaque. Mechanistic interpretability aims to address this opacity by reverse-engineering neural networks at a sub-circuit level, identifying how individual modules, neurons, or attention heads implement specific computational functions. Category theory is a branch of mathematics that provide abstraction, structure, composition, and offers a rigorous language for modeling hierarchical, compositional, and modular systems. This thesis aims to develop a category-theoretic framework for mechanistic interpretability of neural networks. |
| Start date: | |
| End date: | |
| Prerequisites: | |
| IDT supervisors: | Shaibal Barua |
| Examiner: | |
| Comments: |
- Good knowledge on Linear algebra, Discrete mathematics, Basic Logic and algebraic thinking - Knowledge in Neural Network Foundations - Python and PyTorch |
| Company contact: |