Sarthak Mittal

I am a PhD student at Mila where I am advised by Guillaume Lajoie and Yoshua Bengio for my research. Throughout my PhD, I have been lucky enough to collaborate with a number of amazing folks including Stefan Bauer at Helmholtz, Marcus Brubaker at York University, Priyank Jaini at Google DeepMind, Oleksii Kuchaiev at NVIDIA and Anderson Schneider at Morgan Stanley.

Prior to joining Mila, I worked briefly at Uber's Advanced Technologies Group at Toronto, Canada. Before that, I was an undergraduate at the Department of Mathematics and Statistics at the Indian Institute of Technology (IIT) Kanpur.

I am interested in designing unified AI systems that instead of learning to solve specific tasks, learn general-purpose optimization routines implicitly in its inference mechanism. Such systems can then be provided with arbitrary datasets as context and would provide optimal predictions without any further training. Even further, I want to leverage the formalism provided by Bayesian methodology to allow for learning of amortized posterior distributions for a variety of tasks in a zero-shot manner, thereby alleviating the need for repetitive sampling or optimization routines for new tasks.

Email  /  GitHub  /  Google Scholar  /  Twitter  /  LinkedIn

profile photo

Recent News


project image

Exploring Exchangeable Dataset Amortization for Bayesian Posterior Inference

Sarthak Mittal*, Niels Leif Bracher*, Guillaume Lajoie, Priyank Jaini, Marcus A Brubaker
ICML SPIGM Workshop, 2023

We propose a neural network-based approach that can handle exchangeable observations and amortize over datasets to convert the problem of Bayesian posterior inference into a single forward pass of a network.

project image

Mixupe: Understanding and improving mixup from directional derivative perspective

Yingtian Zou, Vikas Verma, Sarthak Mittal, Wai Hoh Tang, Hieu Pham, Juho Kannala, Yoshua Bengio, Arno Solin, Kenji Kawaguchi
UAI, 2023 (Oral)
arxiv / code /

In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. Based on this new insight, we propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup.

project image

Leveraging Synthetic Targets for Machine Translation

Sarthak Mittal, Oleksii Hrinchuk, Oleksii Kuchaiev
ACL Findings, 2023
arxiv /

We provide a recipe for training machine translation models in a limited resource setting by leveraging synthetic target data generated using a large pre-trained model. We show that consistently across different benchmarks in bilingual, multilingual, and speech translation setups, training models on synthetic targets outperforms training on the actual ground-truth data.

project image

From Points to Functions: Infinite-dimensional Representations in Diffusion Models

Sarthak Mittal, Guillaume Lajoie, Stefan Bauer, Arash Mehrjou
Preprint, 2023
arxiv /

Diffusion models can be equipped to do representation learning and lead to infinite-dimensional trajectory-based representations for static objects like images. We provide a thorough analysis into the kind of semantics that are encoded in different parts of this trajectory representation and provide attention-based methods to automatically filter important parts of the representation based on a given discretization scheme.

project image

Diffusion-Based Representation Learning

Sarthak Mittal*, Korbinian Abstreiter*, Stefan Bauer, Bernhard Schölkopf, Arash Mehrjou
ICML, 2023
arxiv /

We leverage diffusion based models for learning representations of the input which can then be used for downstream applications. Our experiments highlight that learning representations for the task of iterative denoising works better than reconstruction based methods like autoencoders and can also further be utilized as a pre-training setup for semi-supervised models like LaplaceNet.

project image

Is a Modular Architecture Enough?

Sarthak Mittal, Yoshua Bengio, Guillaume Lajoie
NeurIPS, 2022
arxiv / code /

Through synthetic mixture-of-experts styled data generating process, we analyze typical modular systems and uncover their sub-optimality through the lens of collapse and specialization metrics proposed. We uncover the benefits of optimal specialization, whether typical modular systems achieve it as well as its trends with increasing number of modules and experts.

project image

On Neural Architecture Inductive Biases for Relational Tasks

Giancarlo Kerg, Sarthak Mittal, David Rolnick, Yoshua Bengio, Blake Richards and Guillaume Lajoie
Preprint, 2022
arxiv / code /

We analyze systems that allow for a separate dedicated relational reasoning unit like Emergent Symbols through Binding in External Memory. We not only uncover their core inductive biases that are imperative for good OoD generalization but also propose a novel mechanism, CorelNet, that outperforms such baselines over more complex reasoning tasks.

project image

Compositional Attention: Disentangling Search and Retrieval

Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio, Guillaume Lajoie
ICLR, 2022 (Spotlight)
arxiv / code / blog /

We propose Compositional Attention, a novel mechanism that disentangles searches and retrievals and allows for their flexible re-composition. This enables the model to dynamically retrieve different attributes of objects and improves upon certain limitations of Multi-head attention.

project image

Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning

Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajoie, Stefan Bauer, Danilo Rezende, Michael Mozer, Yoshua Bengio, Christopher Pal
NeurIPS Datasets and Benchmarks Track, 2021
arxiv / code /

We propose a benchmark for systematic evaluation of model-based reinforcement learning algorithms. We evaluate various algorithms on tasks with underlying causal graphs of varying complexities on metrics based on rankings, reconstructions, as well as downstream RL performance.

project image

Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules

Sarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio
ICML, 2020
arxiv / code /

We propose BRIMs: Bidirectional Recurrent Independent Mechanisms which is an attention-based modular system that allows efficient combination of top-down and bottom-up signals for robust prediction. We empirically demonstrate the benefits of dynamic modulation of top-down and bottom-up signals on benchmarks across a variety of domains.

project image

A Modern Take on the Bias-Variance Tradeoff in Neural Networks

Brady Neal, Sarthak Mittal, Aristide Baratin, Vinayak Tantia, Matthew Scicluna, Simon Lacoste-Julien, Ioannis Mitliagkas
ICML Deep Phenomena Workshop, 2019
arxiv /

We study overfitting and generalization in overparameterized neural networks through the lens of the bias-variance tradeoff. We empirically uncover that with increasing width of a single-layered neural network, both bias and variance decrease while with increasing depth, variance increases. We further provide a decomposition of variance into two further terms: variance due to optimization and sampling respectively.

Design and source code from Leonid Keselman's Jekyll fork of Jon Barron's website.