NASA GIBS WorldView Similarity Search

A No-Code, Self-Supervised Learning and Active Labeling Toolset to enable searching Petabyte-Scale Imagery

team

Researchers:

Rudy Venguswamy, Fernando Lisboa, Ajay Krishnan, Tarun Narayanan, Jenessa Peterson, Daniela Fragoso, Kai Priester, Nathan Hilton, Stefan Pessolano, Yujeong Zoe Lee, Surya Ambardar, Aaron Banze, Mike Levy, Abhigya Sodani, Shivam Verma, Suhas Kotha, Deep Patel, Erin Gao, Rajeev Godse, Sarah Chen, Esther Cao, Mandeep Khokhar, Sumanth Ramesh, Walker Stevens, Subhiksha Muthukrishnan, Navya Reddy Sandadi and Leo Silverberg

Faculty:

Anirudh Koul, Satyarth Praveen, Sherin Thomas, Dharini Chandrasekaran, Meher Anand Kasam, Siddha Ganju, Udara Weerasinghe

Posted by:

leosilverberg

details

Program:

FDL US

Institution:

Year:

Challenge area:

earth science

Need & Challenge

Before embarking on a scientific study related to particular phenomena, such as wildfires, scientists need to collect numerous examples of these phenomena. Locating these examples requires searching through 197 million square miles of satellite imagery each day across more than 20 years of data. Such an effort can produce a valuable trove of data, but the act of manually searching the data is cumbersome and laborious. 

This project aims to empower scientists to search through vast amounts of satellite imagery given a single image, returning conceptually similar images, and then with a human-in-the-loop active labelling system, build a curated dataset. 

Lack of labeled images, missing swaths of data within imagery, multiple phenomena observed in the same region, highly imbalance data with rare phenomena and inability of imagenet pre-trained models to find relevant samples makes the problem challenging scientifically.

Size of the data presents a scalability and cost effectiveness challenge.

Results:

The project solved the challenge by utilizing advances in self-supervised learning (SimSiam, SimCLR) to train AI models to represent the data, and then Approximate Nearest Neighbors methods deployable on cloud to search at scale

Additional innovations like pixel generation for missing parts and fast cloud identification were employed as pre-processing to put attention on specific parts of the image (e.g. on temporal items like clouds or on static objects like land and ocean)

Progressive sampling techniques to pick data for training help active learning as well as quality of trained representations from imbalance data, and save costs beyond order of 3.

With tools for fast interactive labeling of images by swiping left/right, and a human-in-the-loop active labeling pipeline usable by multiple users simultaneous significantly reduces the time to curate datasets.

During a demonstration of searching 5 million tiles of earth for islands starting from a single image, ~1000 islands were identified in just 52 minutes. If manually done, this would take 7000 hours of time.

The modular pipeline is also usable on interdisciplinary problems beyond earth science. For example, the pipeline was usable with no modifications to train on Hubble Space Telescope data.

All tools are aimed to be launched through command line without any programming or AI knowledge

Tools:

GIBS-Downloader: Single line command to download any GIBS Satellite imagery with ease. https://github.com/spaceml-org/GIBS-Downloader 

Self-supervised Learner: Single line command to train a self-supervised network on any image dataset (without using labels) https://github.com/spaceml-org/Self-Supervised-Learner

Swipe Labeler: A web based tool to label an image dataset rapidly into two categories (relevant or not relevant) by swiping left or right. https://github.com/spaceml-org/Swipe-Labeler

Next Steps:

Under development: Active Learner