Publications




Feel free to browse our previous publications to learn more about our advancing computer vision technology!

OpenSentinelMap: A Large-Scale Land Use Dataset using OpenStreetMap and Sentinel-2 Imagery

Remote sensing data is plentiful, but downloading, organizing, and transforming large amounts of data into a format readily usable by modern machine learning methods is a challenging and labor-intensive task. We present the OpenSentinelMap dataset, which consists of 137,045 unique 3.7 km2 spatial cells, each containing multiple multispectral Sentinel-2 images captured over a 4 year time period and a set of corresponding per-pixel semantic labels derived from OpenStreetMap data. The labels are not necessarily mutually exclusive, and contain information about roads, buildings, water, and 12 land-use categories. The spatial cells are selected randomly on a global scale over areas of human activity, without regard to OpenStreetMap data availability or quality, making the dataset ideal for both supervised, semi-supervised, and unsupervised experimentation. To demonstrate the effectiveness of the dataset, we a) train an off-the-shelf convolutional neural network with minimal modification to predict land-use and building and road location from multispectral Sentinel-2 imagery and b) show that the learned embeddings are useful for downstream fine-grained classification tasks without any fine-tuning.

Authors
Noah Johnson, Wayne Treible, Daniel Crispell
Date
25 April 2022
Source
EarthVision 2022 (CVPR Workshop)
Error Propagation in Satellite Multi-Image Geometry

This article describes an investigation of the source of geospatial error in digital surface models (DSMs) constructed from multiple satellite images. In this study, the uncertainty in surface geometry is separated into two spatial components: global error that affects the absolute position of the surface and local error that varies from surface point to surface point. The global error component is caused by inaccuracy in the satellite imaging process, mainly due to uncertainty in the satellite position and orientation (pose) during image collection. The key sources of local error are lack of surface appearance texture, shadows, and occlusion. These conditions prevent successful matches between the corresponding points in the images of a stereo pair. A key result of the investigation is a new algorithm for determining the absolute geoposition of the DSM that reflects the pose covariance of each satellite during image collection. This covariance information is used to weigh the evidence from each image in the computation of the global position of the DSM. The use of covariance information significantly decreases the overall uncertainty in global position and results in a 3-D covariance matrix for the global accuracy of the DSM. This covariance matrix defines a confidence ellipsoid within which the actual error must reside. Moreover, the absolute geoposition of each image is refined to the reduced uncertainty derived from the weighted evidence from the entire image set. This article also describes an approach to the prediction of local error in the DSM surface. The observed variance in surface position within a single stereo surface reconstruction defines the local horizontal error. The variance in the fused set of elevations from multiple stereo pairs at a single DSM location defines the local vertical error. These accuracy predictions are compared to ground truth provided by light detection and ranging (LiDAR) scans of the same geographic region of interest. The prediction of global and local error is compared to the actual errors for several geographic locations and mixes of satellite type. The predicted error bounds contain the observed errors according to the allowed percentage of outliers.

Error Propagation in Satellite Multi-Image Geometry
AuthorsJoseph L. Mundy, Hank J. Theiss
Authors
Joseph L. Mundy, Hank J. Theiss
Date
16 November 2021
Source
IEEE Transactions on Geoscience and Remote Sensing
4-D Scene Alignment in Surveillance Video

Designing robust activity detectors for fixed camera surveillance video requires knowledge of the 3-D scene. This paper presents an automatic camera calibration process that provides a mechanism to reason about the spatial proximity between objects at different times. It combines a CNN-based camera pose estimator with a vertical scale provided by pedestrian observations to establish the 4-D scene geometry. Unlike some previous methods, the people do not need to be tracked nor do the head and feet need to be explicitly detected. It is robust to individual height variations and camera parameter estimation errors.

4-D Scene Alignment in Surveillance Video
AuthorsRobert Wagner, Patrick Feeney, Daniel Crispell, Joseph Mundy
Authors
Robert Wagner, Patrick Feeney, Daniel Crispell, Joseph Mundy
Date
15 October 2019
Source
2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)
Learning Dense Stereo Matching for Digital Surface Models from Satellite Imagery

Digital Surface Model generation from satellite imagery is a difficult task that has been largely overlooked by the deep learning community. Stereo reconstruction techniques developed for terrestrial systems including self driving cars do not translate well to satellite imagery where image pairs vary considerably. In this work we present neural network tailored for Digital Surface Model generation, a ground truthing and training scheme which maximizes available hardware, and we present a comparison to existing methods. The resulting models are smooth, preserve boundaries, and enable further processing. This represents one of the first attempts at leveraging deep learning in this domain.

Learning Dense Stereo Matching for Digital Surface Models from Satellite Imagery
AuthorsWayne Treible, Scott Sorensen, Andrew D. Gilliam, Chandra Kambhamettu, Joseph L. Mundy
Authors
Wayne Treible, Scott Sorensen, Andrew D. Gilliam, Chandra Kambhamettu, Joseph L. Mundy
Date
11 December 2018
Source
arxiv.org
Pix2Face: Direct 3D Face Model Estimation

An efficient, fully automatic method for 3D face shape and pose estimation in unconstrained 2D imagery is presented. The proposed method jointly estimates a dense set of 3D landmarks and facial geometry using a single pass of a modified version of the popular “U-Net” neural network architecture. Additionally, we propose a method for directly estimating a set of 3D Morphable Model (3DMM) parameters, using the estimated 3D landmarks and geometry as constraints in a simple linear system. Qualitative modeling results are presented, as well as quantitative evaluation of predicted 3D face landmarks in unconstrained video sequences.

Pix2Face: Direct 3D Face Model Estimation
AuthorsDaniel Crispell and Maxim Bazik
Authors
Daniel Crispell and Maxim Bazik
Date
30 March 2018
Source
ICCV 2017: 300 3D Facial-Videos In-The-Wild Challenge Workshop
SatTel: A Framework for Commercial Satellite Imagery Exploitation

This paper presents the innovative SatTel framework, designed to automatically access, collate, process, and exploit commercial satellite imagery from a wide variety of vendors. Established vendors such as DigitalGlobe provide high resolution imagery with limited coverage, while disruptive vendors such as Planet and BlackSky provide low resolution imagery with near global coverage. SatTel provides a single point of entry for exploitation of these contrasting and complementary vendor capabilities. The authors illustrate the value of the SatTel framework via demonstrative change detection capabilities. SatTel change detection from small satellite imagery based on comparison of image to image appearance achieves mean average precision (MAP) above 0.75 for many sites compared to ground truth analyst annotation. SatTel change detection from high resolution satellite imagery based on multidimensional geometric structures achieves an average precision of 0.84 for elevation changes above 3.0 meters compared to ground truth analyst annotation.

SatTel: A Framework for Commercial Satellite Imagery Exploitation
AuthorsAndrew D. Gilliam, Thomas B. Pollard, Andrew Neff, Yi Dong, Scott Sorensen, Robert Wagner, Selene Chew, Todd V. Rovito, Joseph L. Mundy
Authors
Andrew D. Gilliam, Thomas B. Pollard, Andrew Neff, Yi Dong, Scott Sorensen, Robert Wagner, Selene Chew, Todd V. Rovito, Joseph L. Mundy
Date
12 March 2018
Source
2018 IEEE Winter Conference on Applications of Computer Vision (WACV)
Global-Local Airborne Mapping (GLAM): Reconstructing a City from Aerial Videos

We present a feature-based visual SLAM system for aerial video whose simple design permits near real-time operation, and whose scalability permits large-area mapping using tens of thousands of frames, all on a single conventional computer. Our approach consists of two parallel threads: the first incrementally creates small locally consistent submaps and estimates camera poses at video rate; the second aligns these submaps with one another to produce a single globally consistent map via factor graph optimization over both poses and landmarks. Scale drift is minimized through the use of 7-degree-of-freedom similarity transformations during submap alignment. We quantify our system’s performance on both simulated and real data sets, and demonstrate city-scale map reconstruction accurate to within 2 meters using nearly 90,000 aerial video frames - to our knowledge, the largest and fastest such reconstruction to date.

Global-Local Airborne Mapping (GLAM): Reconstructing a City from Aerial Videos
AuthorsHasnain Vohra, Maxim Bazik, Matthew Antone, Joseph Mundy, William Stephenson
Authors
Hasnain Vohra, Maxim Bazik, Matthew Antone, Joseph Mundy, William Stephenson
Date
30 May 2017
Source
Tech Report
Dataset Augmentation for Pose and Lighting Invariant Face Recognition

The performance of modern face recognition systems is a function of the dataset on which they are trained. Most datasets are largely biased toward “near-frontal” views with benign lighting conditions, negatively effecting recognition performance on images that do not meet these criteria. The proposed approach demonstrates how a baseline training set can be augmented to increase pose and lighting variability using semi-synthetic images with simulated pose and lighting conditions. The semi-synthetic images are generated using a fast and robust 3D shape estimation and rendering pipeline which includes the full head and background. Various methods of incorporating the semi-synthetic renderings into the training procedure of a state of the art deep neural network-based recognition system without modifying the structure of the network itself are investigated. Quantitative results are presented on the challenging IJB-A identification dataset using a state of the art recognition pipeline as a baseline.

Dataset Augmentation for Pose and Lighting Invariant Face Recognition
AuthorsDaniel Crispell, Octavian Biris, Nate Crosswhite, Jeffrey Byrne, Joseph L. Mundy
Authors
Daniel Crispell, Octavian Biris, Nate Crosswhite, Jeffrey Byrne, Joseph L. Mundy
Date
30 May 2017
Source
2016 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)
Geo-localization using Volumetric Representations of Overhead Imagery

This paper addresses the problem of determining the location of a ground level image by using geo-referenced overhead imagery. The input query image is assumed to be given with no meta-data and the content of the image is to be matched to a priori constructed reference representations. The proposed 3D geo-localization framework performs better than the 2D approach for 75 % of the query images.

Geo-localization using Volumetric Representations of Overhead Imagery
AuthorsOzge C. Ozcanli, Yi Dong, Joseph L. Mundy
Authors
Ozge C. Ozcanli, Yi Dong, Joseph L. Mundy
Date
04 February 2016
Source
International Journal of Computer Vision (IJCV), Volume 116, Issue 3, pp 226-246
A Comparison of Stereo and Multiview 3D Reconstruction Using Cross-sensor Satellite Imagery

In this paper, an automatic geo-location correction framework that corrects multiple satellite images simultaneously is presented. As a result of the proposed correction process, all the images are effectively registered to the same absolute geodetic coordinate frame. The usability and the quality of the correction framework are shown through probabilistic 3D surface model reconstruction. The models given by original satellite geo-positioning meta-data and the corrected meta-data are compared and the quality difference is measured through an entropy-based metric applied onto the high resolution height maps given by the 3D models.

Authors
Ozge C. Ozcanli, Yi Dong, Joseph L. Mundy
Date
04 February 2016
Source
International Journal of Computer Vision (IJCV), Volume 116, Issue 3, pp 226-246
Automatic Geo-location Correction of Satellite Imagery

Modern satellites tag their images with geolocation information using GPS and star tracking systems. Depending on the quality of the geopositioning equipment, errors may range from a few meters to tens of meters on the ground. In this paper, an automatic geolocation correction framework that corrects images from multiple satellites simultaneously is presented. As a result of the proposed correction process, all the images are effectively registered to the same absolute geodetic coordinate frame.

Automatic Geo-location Correction of Satellite Imagery
AuthorsOzge C. Ozcanli, Yi Dong, Joseph L. Mundy, Helen Webb, Riad Hammoud, Victor Tom
Authors
Ozge C. Ozcanli, Yi Dong, Joseph L. Mundy, Helen Webb, Riad Hammoud, Victor Tom
Date
23 June 2014
Source
International Journal of Computer Vision (IJCV), Volume 116, Issue 3, pp 263-277
Automatic Geo-location Correction of Satellite Imagery

Modern satellites tag their images with geolocation information using GPS and star tracking systems. Depending on the quality of the geopositioning equipment, errors may range from a few meters to tens of meters on the ground. In this paper, an automatic geolocation correction framework that corrects images from multiple satellites simultaneously is presented. As a result of the proposed correction process, all the images are effectively registered to the same absolute geodetic coordinate frame.

Automatic Geo-location Correction of Satellite Imagery
AuthorsOzge C. Ozcanli, Yi Dong, Joseph L. Mundy, Helen Webb, Riad Hammoud, Victor Tom
Authors
Ozge C. Ozcanli, Yi Dong, Joseph L. Mundy, Helen Webb, Riad Hammoud, Victor Tom
Date
23 June 2014
Source
International Journal of Computer Vision (IJCV), Volume 116, Issue 3, pp 263-277
3D Modeling Using Miniscule Volume Elements

A new technique to optimize volumetric representation and advances in graphics processing have enabled efficient construction of 3D models from 2D imagery, while fully capturing the uncertainty in the data.

3D Modeling Using Miniscule Volume Elements
AuthorsOzge Ozcanli, Daniel Crispell, Joseph Mundy, Vishal Jain, and Tom Pollard
Authors
Ozge Ozcanli, Daniel Crispell, Joseph Mundy, Vishal Jain, and Tom Pollard
Date
01 August 2012
Source
SPIE Newsroom
Three-Dimensional Volume Representation for Geospatial Data in Voxel Models

Extracting useful geospatial data from imagery is a fundamental challenge that has seen significant growth over the years as technology advances have been brought to bear on the problem. An important component of this problem addresses how the data should be represented to ensure the information content is accurately captured, preserved, and conveyed to consumers. Much of the information contained in the imagery is redundant and should be transformed so that only the essential information is retained and stored, allowing the redundant data to be discarded. An efficient mechanism for achieving this goal is the 3D Voxel model.

Authors
F. Tanner, D. Crispell, and R. Isbell
Date
13 March 2012
Source
ASPRS 2012 Annual Conference
A Variable-Resolution Probabilistic Three-Dimensional Model for Change Detection

Given a set of high-resolution images of a scene, it is often desirable to predict the scene’s appearance from viewpoints not present in the original data for purposes of change detection. When significant 3D relief is present, a model of the scene geometry is necessary for accurate prediction to determine surface visibility relationships. In the absence of an a priori high-resolution model (such as those provided by LIDAR), scene geometry can be estimated from the imagery itself.

Authors
D. Crispell, J. L. Mundy, and G. Taubin
Date
19 January 2012
Source
IEEE Transactions on Geoscience and Remote Sensing
Real-Time Rendering and Dynamic Updating of 3D Volumetric Data

An efficient, fully automatic method for 3D face shape and pose estimation in unconstrained 2D imagery is presented. The proposed method jointly estimates a dense set of 3D landmarks and facial geometry using a single pass of a modified version of the popular “U-Net” neural network architecture. Additionally, we propose a method for directly estimating a set of 3D Morphable Model (3DMM) parameters, using the estimated 3D landmarks and geometry as constraints in a simple linear system. Qualitative modeling results are presented, as well as quantitative evaluation of predicted 3D face landmarks in unconstrained video sequences.

Real-Time Rendering and Dynamic Updating of 3D Volumetric Data
AuthorsAndrew Miller, Vishal Jain, & Joseph Mundy
Authors
Andrew Miller, Vishal Jain, & Joseph Mundy
Date
05 March 2011
Source
Fourth Workshop on General Purpose Processing on Graphics Processing Units