RoboMAE is our multimodal annotation environment for robot sensor data. RoboMAE allows human annotators to concentrate on high-level decisions regarding the interpretation of a scene, while at the same time producing full frame-by-frame annotations with cross-linking of the same object's recognition across the different modalities. Our approach is based on expoiting spatio-temporal co-occurence to link the different projections of the same object in the various supported modalities and on automatically interpolating annotations between explicitly annotated frames. The backend automations interact with the visual environment in real time, providing annotators with immediate feedback for their actions. Our approach is demonstrated and evaluated on a dataset collected for the recognition and localization of conversing humans, an important task in human-robot interaction applications. Both the annotation environment and the conversation dataset are made publicly available.


RoboMAE is developed in MatLab and distributed under the GPLv2 license. The first version of RoboMAE was developed, almost exclusively by Konstantinos Tsiakas, from Spring 2013 up until and including IRSS 2013, the 4th International Research-based Summer School (IRSS 2013), July 2013, NCSR "Demokritos". It incorporates the earlier speaker diarization system by Giannakopoulos and Petridis (2012) and third-party libraries under the BSD license.

The Matlab code can be downloaded from

Konstantinos has also prepared the following video tutorial:


Giannakopoulos, Theodoros and Sergios Petridis (2012) "Fisher linear semi-discriminant analysis for speaker diarization." IEEE Transactions on Audio, Speech, and Language Processing 20.7: 1913-1922.

Sample data

RoboMAE has been used to annotate image, depth, audio, and laser range data collected by Sek during IRSS 2013

The dataset is available as eight individual recording sessions or as a single package:

    All data and annotations (6.4GB)  
Session 1 Session 1 thumbnail Data (444MB) Annotations
Session 2 Session 2 thumbnail Data (562MB) Annotations
Session 3 Session 3 thumbnail Data (418MB) Annotations
Session 4 Session 4 thumbnail Data (772MB) Annotations
Session 5 Session 5 thumbnail Data (289MB) Annotations
Session 6 Session 6 thumbnail Data (1.2GB) Annotations
Session 7 Session 7 thumbnail Data (1.6GB) Annotations
Session 8 Session 8 thumbnail Data (1.3GB) Annotations