Moving to École des Ponts ParisTech!

I'm moving to Ecole des Ponts ParisTech (ENPC) as senior researcher, in the team IMAGINE from LIGM. This is a full time research position, which means more time for future collaboration.

Side note, our paper "Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings" has been accepted at ICCV 19. See you in Seoul!

Amazing PhD students

I feel blessed with my PhD students.

Last week saw the PhD day at our lab, where all second year PhD students of the lab present their work at a small open workshop. This year, the formula was changed to cope with the number of people presenting (around 20). We had 3 minute thesis presentations for the general flavor of the work followed by poster sessions for the technical details.

Marie-Morgane won the best oral presentation award for her amazing 3 minute presentation (she was the clear winner here). Pierre Jacob won the best poster presentation for his excellent explanations. I am very happy that both of my second year students won, because they put so much effort in their work that it has to be rewarded. Especially in the current context where it is so difficult to publish at major conferences. Last year, Diogo Luvizon won the oral presentation award and three years ago, Jérôme Fellus also won the oral presentation award.

I feel very lucky to work with these talented people.

Optimizing deep learning using Gossip

We recently published a paper entitled Distributed optimization for deep learning with gossip exchange with M. Blot, N. Thome and M. Cord. This work is about distributed optimization for deep neural networks in an asynchronous and decentralized setup. We tackle the case where you have several computing resources (e.g., GPU) and you want to train a single deep learning model. We propose an optimization procedure based on gossip, where each computing optimizes a local model and sometimes exchange their weights with a random neighbor. There are several key aspect to this research.

First, we show that the gossiping strategy is in expectation equivalent to performing a stochastic gradient descent with mini-batches of a size equivalent to the aggregation of all nodes. This means that you can optimize big models which are notoriously hard to train without a bigger batch size (I'm looking at you resnet) on a collection of small GPU, rather than having to buy a larger and much more expensive one.

Second, we also show that the gossip mechanism perform some sort of stochastic exploration that in my opinion is similar to dropout, but on entire models. In short, it is a way to train an ensemble and getting the aggregate of this ensemble thanks to the consensus optimization.

There are many interesting work to be done on this topic, mostly theoretical work, that I am very looking forward to in the future.