Sarthak Mittal

I am a PhD student at Mila where I am advised by Guillaume Lajoie and Yoshua Bengio for my research. Throughout my PhD, I have been lucky enough to collaborate with a number of amazing folks including Stefan Bauer at Helmholtz, Marcus Brubaker at York University, Priyank Jaini at Google DeepMind, Oleksii Kuchaiev at NVIDIA and Anderson Schneider at Morgan Stanley.

Prior to joining Mila, I worked briefly at Uber's Advanced Technologies Group at Toronto, Canada. Before that, I was an undergraduate at the Department of Mathematics and Statistics at the Indian Institute of Technology (IIT) Kanpur.

I am interested in designing unified AI systems that instead of learning to solve specific tasks, learn general-purpose optimization routines implicitly in its inference mechanism. Such systems can then be provided with arbitrary datasets as context and would provide optimal predictions without any further training. Even further, I want to leverage the formalism provided by Bayesian methodology to allow for learning of amortized posterior distributions for a variety of tasks in a zero-shot manner, thereby alleviating the need for repetitive sampling or optimization routines for new tasks.

Email / GitHub / Google Scholar / Twitter / LinkedIn

Research

	Exploring Exchangeable Dataset Amortization for Bayesian Posterior Inference Sarthak Mittal, Niels Leif Bracher, Guillaume Lajoie, Priyank Jaini, Marcus A Brubaker ICML SPIGM Workshop, 2023 We propose a neural network-based approach that can handle exchangeable observations and amortize over datasets to convert the problem of Bayesian posterior inference into a single forward pass of a network.
	Mixupe: Understanding and improving mixup from directional derivative perspective Yingtian Zou, Vikas Verma, Sarthak Mittal, Wai Hoh Tang, Hieu Pham, Juho Kannala, Yoshua Bengio, Arno Solin, Kenji Kawaguchi UAI, 2023 (Oral) arxiv / code / In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. Based on this new insight, we propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup.
	Leveraging Synthetic Targets for Machine Translation Sarthak Mittal, Oleksii Hrinchuk, Oleksii Kuchaiev ACL Findings, 2023 arxiv / We provide a recipe for training machine translation models in a limited resource setting by leveraging synthetic target data generated using a large pre-trained model. We show that consistently across different benchmarks in bilingual, multilingual, and speech translation setups, training models on synthetic targets outperforms training on the actual ground-truth data.
	From Points to Functions: Infinite-dimensional Representations in Diffusion Models Sarthak Mittal, Guillaume Lajoie, Stefan Bauer, Arash Mehrjou Preprint, 2023 arxiv / Diffusion models can be equipped to do representation learning and lead to infinite-dimensional trajectory-based representations for static objects like images. We provide a thorough analysis into the kind of semantics that are encoded in different parts of this trajectory representation and provide attention-based methods to automatically filter important parts of the representation based on a given discretization scheme.
	Diffusion-Based Representation Learning Sarthak Mittal, Korbinian Abstreiter, Stefan Bauer, Bernhard Schölkopf, Arash Mehrjou ICML, 2023 arxiv / We leverage diffusion based models for learning representations of the input which can then be used for downstream applications. Our experiments highlight that learning representations for the task of iterative denoising works better than reconstruction based methods like autoencoders and can also further be utilized as a pre-training setup for semi-supervised models like LaplaceNet.
	Is a Modular Architecture Enough? Sarthak Mittal, Yoshua Bengio, Guillaume Lajoie NeurIPS, 2022 arxiv / code / Through synthetic mixture-of-experts styled data generating process, we analyze typical modular systems and uncover their sub-optimality through the lens of collapse and specialization metrics proposed. We uncover the benefits of optimal specialization, whether typical modular systems achieve it as well as its trends with increasing number of modules and experts.
	On Neural Architecture Inductive Biases for Relational Tasks Giancarlo Kerg, Sarthak Mittal, David Rolnick, Yoshua Bengio, Blake Richards and Guillaume Lajoie Preprint, 2022 arxiv / code / We analyze systems that allow for a separate dedicated relational reasoning unit like Emergent Symbols through Binding in External Memory. We not only uncover their core inductive biases that are imperative for good OoD generalization but also propose a novel mechanism, CorelNet, that outperforms such baselines over more complex reasoning tasks.
	Compositional Attention: Disentangling Search and Retrieval Sarthak Mittal, Sharath Chandra Raparthy, Irina Rish, Yoshua Bengio, Guillaume Lajoie ICLR, 2022 (Spotlight) arxiv / code / blog / We propose Compositional Attention, a novel mechanism that disentangles searches and retrievals and allows for their flexible re-composition. This enables the model to dynamically retrieve different attributes of objects and improves upon certain limitations of Multi-head attention.
	Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajoie, Stefan Bauer, Danilo Rezende, Michael Mozer, Yoshua Bengio, Christopher Pal NeurIPS Datasets and Benchmarks Track, 2021 arxiv / code / We propose a benchmark for systematic evaluation of model-based reinforcement learning algorithms. We evaluate various algorithms on tasks with underlying causal graphs of varying complexities on metrics based on rankings, reconstructions, as well as downstream RL performance.
	Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules Sarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio ICML, 2020 arxiv / code / We propose BRIMs: Bidirectional Recurrent Independent Mechanisms which is an attention-based modular system that allows efficient combination of top-down and bottom-up signals for robust prediction. We empirically demonstrate the benefits of dynamic modulation of top-down and bottom-up signals on benchmarks across a variety of domains.
	A Modern Take on the Bias-Variance Tradeoff in Neural Networks Brady Neal, Sarthak Mittal, Aristide Baratin, Vinayak Tantia, Matthew Scicluna, Simon Lacoste-Julien, Ioannis Mitliagkas ICML Deep Phenomena Workshop, 2019 arxiv / We study overfitting and generalization in overparameterized neural networks through the lens of the bias-variance tradeoff. We empirically uncover that with increasing width of a single-layered neural network, both bias and variance decrease while with increasing depth, variance increases. We further provide a decomposition of variance into two further terms: variance due to optimization and sampling respectively.

Design and source code from Leonid Keselman's Jekyll fork of Jon Barron's website.

Sarthak Mittal

Recent News

Research

Exploring Exchangeable Dataset Amortization for Bayesian Posterior Inference

Mixupe: Understanding and improving mixup from directional derivative perspective

Leveraging Synthetic Targets for Machine Translation

From Points to Functions: Infinite-dimensional Representations in Diffusion Models

Diffusion-Based Representation Learning

Is a Modular Architecture Enough?

On Neural Architecture Inductive Biases for Relational Tasks

Compositional Attention: Disentangling Search and Retrieval

Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning

Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules

A Modern Take on the Bias-Variance Tradeoff in Neural Networks