4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
arXiv 2024
A framework for training any-to-any multimodal foundation models. Scalable. Open-sourced. Across tens of modalities and tasks.
Hi! I’m an Senior Applied Scientist at Apple. I work primarily in the fields of 3D and multimodal scene understanding. My previous research involved adressing the scarcity of manually labelled 3D data through alternative learning approaches. This resulted in several publications, developing novel weakly and self-supervised learning pipelines for image, point cloud and geometric mesh data. Prior to Apple I was a Senior Research Scientist at Fujitsu Research of Europe.
I completed a Ph.D in 3D computer vision at UCL, London, where I was supervised by Prof. Jan Boehm and collaborated closely with Prof. Tobias Ritschel. I spent the summer of 2021 at Adobe as an intern in the Creative Intelligence Lab, London. Whilst there I worked on geometrically-driven single-image relighting, supervised by Dr. Julien Philip.
arXiv 2024
A framework for training any-to-any multimodal foundation models. Scalable. Open-sourced. Across tens of modalities and tasks.
EuroGraphics 2022
We address the problem of single image relighting. Our work shows monocular depth estimators can provide sufficient geometry when combined with our novel 3D shadow map prediction module.
International Conference on 3D Vision (3DV) 2021
A novel method for self-supervised monocular 3D object detection. This is achieved through differentiable rendering and a GAN-like critic loss.
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci 2021
A pipeline which demonstrates that Terrestrial Laser Scanning (TLS) 3D data can be automatically labelled using off-the-shelf 2D semantic segmentation networks. With only a simple projection of a panoramic image, strong results can be generated with no additional training.
European Conference on Computer Vision (ECCV) 2020
We present a novel weakly-supervised approach for 3D object detection. Our method can be trained on upto 95% less labeled data and still benefits from unlabeled data.
ISPRS Journal of Photogrammetry and Remote Sensing 2019
Manually labelling buildings for segmentation is a time consuming task. We show that readily available GIS mapping data can be used as training data. We develop a novel pipeline which uses Active Contours to improve coarse polygons into fine per-pixel label maps.
arXiv preprint 2019
We release a synthetic Mobile Laser Scanning (MLS) point cloud named SynthCity. Every point has a per-class and per-instance classification, along with colour, return intensity, end-of-line indicator and time.
Remote Sensing 2019
A comprehensive review paper on deep learning for 3D sensed data classification.
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci 2019
A key issue when training deep neural networks for outdoor point clouds is the inevitable large data imbalance. For example, a typical street scene will contain orders of magnitudes more ground points than street furniture. We develop a novel solution to apply a weighted augmentation to reduce the class-imbalance.
Progress in Physical Geography: Earth and Environment 2018
Linear topologies can be challenging terrains for SfM pipelines. A key source of error is caused by intrinsic camera distortions. We demonstrate through effective camera pre-calibration, distortions can be significantly reduced.
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci 2018
An experimental assessment addressing the ability to train a deep CNN-based object detector (RetinaNet / Faster R-CNN on a low quantity of training data. Specifically in the context of repetitive features (railway track).