The BnF benchmark is composed of 3017 images digitized by the Bibliothèque Nationale de France. The collection contains images of manuscripts, posters, coins, drawings, etc, labeled with a very precise thesaurus. The full image collections can be browsed freely at


The benchmark consists in two different tasks.

The first task is to perform the automatic labeling with 569 classes split in 5 categories (Visual, Semantic, Historical, Geographical and Physical). The evaluation is performed by measuring the mean average precision (mAp) using 5-fold cross-validation.

The second task consist in simulating interactive search session. Each interactive search session starts with a single image of a class. Then depending on an active learning strategy, 5 samples are annotated each round, until 50 samples are annotated. The performance is measured by the evolution of the mAp with the increase of the number of labels, average over 10 sessions for each class.

The program to perform the evaluation is written in Java. A wrapper shell script is written around, simply launch:

./ featurefile.fvec
The features have to be stored in fvec file format. An additional -v option allows to have a more verbose output.


Coming soon.