Title: | Category Theory for Mechanistic Interpretability of Neural Networks |
Subject: | Computer science |
Level: | Advanced |
Description: |
Deep neural networks specially transformers are the backbone of current large language models (LLMs) and computer vision models. However the decision-making processes of these models remain largely opaque. Mechanistic interpretability aims to address this opacity by reverse-engineering neural networks at a sub-circuit level, identifying how individual modules, neurons, or attention heads implement specific computational functions. Category theory is a branch of mathematics that provide abstraction, structure, composition, and offers a rigorous language for modeling hierarchical, compositional, and modular systems. This thesis aims to develop a category-theoretic framework for mechanistic interpretability of neural networks. |
Start date: | |
End date: | |
Prerequisites: | |
IDT supervisors: | Shaibal Barua |
Examiner: | |
Comments: |
- Good knowledge on Linear algebra, Discrete mathematics, Basic Logic and algebraic thinking - Knowledge in Neural Network Foundations - Python and PyTorch |
Company contact: |