Optimizing deep learning using Gossip

We recently published a paper entitled Distributed optimization for deep learning with gossip exchange with M. Blot, N. Thome and M. Cord. This work is about distributed optimization for deep neural networks in an asynchronous and decentralized setup. We tackle the case where you have several computing resources (e.g., GPU) and you want to train a single deep learning model. We propose an optimization procedure based on gossip, where each computing optimizes a local model and sometimes exchange their weights with a random neighbor. There are several key aspect to this research.

First, we show that the gossiping strategy is in expectation equivalent to performing a stochastic gradient descent with mini-batches of a size equivalent to the aggregation of all nodes. This means that you can optimize big models which are notoriously hard to train without a bigger batch size (I'm looking at you resnet) on a collection of small GPU, rather than having to buy a larger and much more expensive one.

Second, we also show that the gossip mechanism perform some sort of stochastic exploration that in my opinion is similar to dropout, but on entire models. In short, it is a way to train an ensemble and getting the aggregate of this ensemble thanks to the consensus optimization.

There are many interesting work to be done on this topic, mostly theoretical work, that I am very looking forward to in the future.

ECCV 2018

I'm attending ECCV 2018 right now! My PhD student Marie-Morgane is presenting her accepted paper "Image Reassembly Combining Deep Learning and Shortest Path Problem" at the poster session of Tuesday morning. We propose a new image reassembly task from unordered fragments and have an associated dataset. Send an email if you want to test your skills on this challenging task.

2017-2018 in a nutshell

2017-2018 has been a very busy year and this website is completely outdated, so here is the wrap up of the most important things that happened:
  • I've been on leave at LIP6, Sorbonne Université for the full year
  • I defended my Habilitation in November 2017
  • Jérôme Fellus defended his PhD on decentralized machine learning using gossip protocols in October 2017, with high quality theoretical contributions.
  • We extended the approach to deep learning with colleagues from Sorbonne Université.
  • I started another project during my leave at Sorbonne Université on cross-modal retrieval using deep embeddings. This project in collaboration with Laure Soulier and Matthieu Cord, and part of the PhDs of Remi Cadene and Micael Carvalho, led to a publication at SIGIR 2018
  • I started a new project on using machine learning and machine vision techniques to reassemble historical artifacts fragments. In this project, I supervise a new PhD candidate, Marie-Morgane Paumard.
  • I'm organizing a special session on Image processing for Cultural Heritage at ICIP 2018 next October in Athens
  • Speaking of ICIP, I have two papers accepted, one by Pierre Jacob, who started his PhD on deep learning for image retrieval with Aymeric Histace and me in spring last year, and one by Marie-Morgane.
  • I'll be at CVPR next week presenting the latest work Diogo Luvizon on 2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning. Diogo started a PhD in late 2015 on action recognition in videos, supervised by Hedi Tabia and me.

Some Publications, JKMS, ESANN

2014 is set to be a good year! We already have the reviews for a few papers I've been working on lately. Some are in the ML domain (an ICPR paper with Romain Negrel on supervised sparse subspace learning, an ESANN paper with Jérôme Fellus on decentralized PCA), others in CV (2 journals in revision on low level visual descriptors with Olivier Kihl) and 1 in 3D indexing with Hedi Tabia (CVPR poster)

Other than that, I've been pushing version 2.3 of jkms. I've tagged it the "density edition" since most of the changes are related to density estimators (mostly one class SVM). I've introduced the density version of SimpleMKL, which could e useful to perform model selection. Basically, if you set C=1, you'll get a Parzen estimator, albeit selection the kernel from a specific set.

Finally, I'll be in Brugge next week for the ESANN 2014 conference. A good way to start new projects, if anyone volunteers!

JKernelMachines 2.1 release

I released a new version of JKernelMachines with the following features:
  • new algorithms: SDCA (Shalev-Shwartz 2013), SAG (Le Roux 2012)
  • new custom matrix kernel to handle train and test separately
  • add fvec file format
  • add experimental package for linear algebra and corresponding processing (i.e. PCA, KPCA), use at your own risk!
  • add example app to perform VOC style classification
  • Lots of bug fixes

The linear algebra package is at the moment very rough. I find it somehow useful to perform some king of pre-processing (like a PCA for example). At the moment, my matrix code is a bit slow. If ever I find the time to make solid matrix operations, I will add some nice features like low rank approximations of kernels (Nyström).

Nevertheless, I suggest to always pick the latest git version instead of these releases. The API is very stable now and should not change significantly, which means that all the code you write now is to be supported in the next few years. Thus, picking the latest git always assures you to have the bug-fixes and so on (I don't release versions only for bug-fixes).

One more thing: JKernelMachines has been published in JMLR last month. I encourage you to read the paper and to cite it if you ever use to code for your publications.

page 1 of 3next»