# David Picard

IMAGINE/LIGM

IMAGINE/LIGM

wednesday 30 january 2019

We recently published a paper entitled Distributed optimization for deep learning with gossip exchange with M. Blot, N. Thome and M. Cord. This work is about distributed optimization for deep neural networks in an asynchronous and decentralized setup. We tackle the case where you have several computing resources (e.g., GPU) and you want to train a single deep learning model. We propose an optimization procedure based on gossip, where each computing optimizes a local model and sometimes exchange their weights with a random neighbor. There are several key aspect to this research.

First, we show that the gossiping strategy is in expectation equivalent to performing a stochastic gradient descent with mini-batches of a size equivalent to the aggregation of all nodes. This means that you can optimize big models which are notoriously hard to train without a bigger batch size (I'm looking at you resnet) on a collection of small GPU, rather than having to buy a larger and much more expensive one.

Second, we also show that the gossip mechanism perform some sort of stochastic exploration that in my opinion is similar to dropout, but on entire models. In short, it is a way to train an ensemble and getting the aggregate of this ensemble thanks to the consensus optimization.

There are many interesting work to be done on this topic, mostly theoretical work, that I am very looking forward to in the future.

friday 24 april 2015

Since we had a few publications on the topic of distributed machine learning (in particular a Neurocomputing paper on distributed PCA: "Asynchronous Gossip Principal Components Analysis"), let's talk a bit more about it. My Ph.D. student Jérôme Fellus has rolled out the version first version of his libagml library. This is a distributed machine learning library in C++ that relies on Gossip protocols.

The main page is here: http://perso-etis.ensea.fr/~jerofell/software.html

The way it works is dead simple: you have a mother class that corresponds to a node, and all you have to do is derive it to make your specifi local computation and aggregation procedures. All the networking, instantiation, etc, is handle by the library. Nice, isn't it?