March 21, 2025
To be held at ESB 2012 and on Zoom: https://ubc.zoom.us/j/68285564037?pwd=R2ZpLy9uc2pUYldHT3laK3orakg0dz09
Meeting ID: 682 8556 4037
Passcode: 636252
Reception and refreshments at 14:30 in the PIMS Lounge (ESB 4th floor).
Deep learning models are often seen as black boxes, their complexity stemming from integrating numerous architectural components across multiple layers while being trained on high-dimensional datasets with carefully tuned hyperparameters. In this talk, I will present recent work uncovering structural invariants in the geometry of deep neural representations, providing insights into how these models learn from data.
I will focus on language models trained with the next-token prediction objective as a case study. Despite the apparent conceptual simplicity of this training paradigm, large models trained on vast text corpora demonstrate an extraordinary ability to capture linguistic structure. I will show that in well-trained language models, representations of words and contexts (aka sequences of words) organize themselves into geometries characterized by sparse and low-rank matrix decompositions of training statistics.
I will also discuss how these findings establish connections with the theoretical frameworks of implicit regularization and neural collapse, contributing to the development of a more principled understanding of how deep-learning models extract and encode information from data.
Event Details
March 21, 2025
3:00pm to 4:00pm
ESB 2012 and Zoom
, , CA