Memory usage is a common issue for large ML models. Especially in academia, we have to use resources wisely and make the most out of resources available. While working on my mixture model’s KL-objective, I have to make some less common optimization to reduce memory usage.
Setup Decoder outputs a large matrix \(O\) with dimensionality \((M \times B \times L \times D)\) where \(M\) is the number of clusters, \(B\) is a batch size, \(L\) is a sequence lengths and \(D\) is model output dimension.
Table of content Fairseq How To Easy Mode Not So Easy Mode Fairseq How To Before we start with extension, let’s try to understand how fairseq training works for seq2seq models. In this tutorial I will use only hydra-train module to make it possible load yaml configs.
Install fairseq Follow installation guide on github page.
Training with Hydra Suppose you want to train translation model using hydra training. The command syntax is the following: