SpaceML can be thought of as a toolbox to help the AI community make our next giant leap...
By building the Machine Learning (ML) infrastructure needed to streamline and super-charge the intelligent applications, automation and robotics needed to explore deep space and better manage our planetary spaceship for mutual benefit.
SpaceML: Distributed Open-source Research with Citizen Scientists for the Advancement of Space Technology for NASA [arXiv.org:2012.10610]
The SpaceML Steering Team consists of:
Frontier Development Lab (FDL.AI) in the USA and FDLEurope.org in Europe apply AI technologies to science to push the frontiers of research and develop new tools to help solve some of the biggest challenges that humanity faces.
FDL is a public-private partnership with NASA in the USA and ESA in Europe. We bring together some of the brightest minds from space science, AI and the commercial sector and we’re excited and extremely grateful for the continued support from NASA HQ, NASA ARC, NASA MSFC and our commercial AI partners Google Cloud, IBM, Intel and NVIDIA. As well as our challenge partners Lockheed Martin, the Mayo Clinic, MIT Portugal, USGS, Planet, ISI / Pasteur and the Luxembourg Space Agency
FDL Europe is run in partnership with ESA’s Phi Lab (ESRIN), ESA Mission Operations (ESOC) and Oxford University with support from NVIDIA / Scan computers. Planet, Airbus, SENSE, D-Orbit, Planet and Google Cloud.
FDL (US) and FDL Europe are both run and administered by Trillium Technologies Inc.
05.31.21 VERSION 1.0
PLEASE NOTE THAT THIS DATA POLICY IS PROVISIONAL AT THIS TIME AND WE ARE SOLICITING FEEDBACK FROM THE COMMUNITY. PLEASE CONTACT ANCHAL@FDL.AI WITH ANY QUERIES OR SUGGESTIONS.
1. Introduction, Purpose, and Scope
SpaceML.org’s (“SpaceML”) Mission is to enable scientists, scholars and science enthusiasts alike by providing access to Machine Learning (ML) projects, code, MLOPs, server-side compute, analysis ready data snippets and maintained benchmark datasets - in the spirit of open science laid out in NASA’s Strategy for Data Management and Computing for Groundbreaking Science (2019-2024).
https://smd-prod.s3.amazonaws.com/science-red/s3fs-public/atoms/files/SDMWG_Full%20Document_v3.pdf
SpaceML has been designed to provide a focus point for a community of researchers who support each other in learning new techniques, enriching open science resources and tackling cross-domain problems in accordance with Executive Order Exec. Order No. 13,859, 84 Fed. Reg. 3967 (February 14, 2019): Maintain American Leadership in Artificial Intelligence, by:
Promoting sustained investment in AI R&D in collaboration with industry, academia, international partners and allies, and other non-Federal entities to generate technological breakthroughs in AI and related technologies and to rapidly transition those breakthroughs into capabilities that contribute to our economic and national security.
Enhancing access to high-quality and fully traceable Federal data, models, and computing resources to increase the value of such resources for AI R&D, while maintaining safety, security, privacy, and confidentiality protections consistent with applicable laws and policies.
Reducing barriers to the use of AI technologies to promote their innovative application while protecting American technology, economic and national security, civil liberties, privacy, and values.
Training the next generation of American AI researchers and users through apprenticeships; skills programs; and education in science, technology, engineering, and mathematics (STEM), with an emphasis on computer science, to ensure that American workers, including Federal workers, are capable of taking full advantage of the opportunities of AI.
A key step toward the goals of SpaceML is the provision of high quality analysis ready data from NASA, ESA, USGS and partners of the Frontier Development Lab (FDL.ai). This data is harmonized and code snippets provided with detailed notebooks to enable seamless on-ramping for interested researchers and server-side compute (within the limits of Google Cloud’s free credits initiative.) The full datasets are also available for egress upon request for analysis on your own compute infrastructure.
Rapid advances and the ever increasing supply of heterogeneous data has led to a fragmented data management environment with multiple disparate archives and invariably a complex ETL (Extract / Transform / Load) workflow. SpaceML seeks to consolidate data management, code and computation in one repository, to enable more seamless Machine Learning experimentation, governance and reproducibility.
2. SpaceML’s Data Environment
SpaceML will be processing, storing and organizing data and code using the Google Cloud Platform (GCP) and Google Colab Notebooks. Access to these data products will be granted through SpaceML.org. Colab notebooks with data sample data snippets hosted on SpaceML will be available for logged-in users.
Access to full datasets can be made via request with a short description of your science goals.
2.1 Key Information
2.1.1.How can I find the data that I’m looking for?
Links to specific datasets will be stored under the corresponding project tab on SpaceML.org. If you are searching for data that isn’t currently hosted on the site, you can submit a new Data Request Form (linked here).
2.1.2 Where does SpaceML store data?
Most data will be stored in a GCP (Google Cloud Platform) account administered by FDL.ai, although some data may be stored on a partner storage platform in line with data management policies specific to that project.
2.1.3 Who is enabling SpaceML?
This data is managed and made accessible through a series of grants from FDL partners administered by the SETI Institute. SpaceML.org is built and maintained by Trillium Technologies Inc.
2.1.4 What licence is used for SpaceML or FDL data, ML models and software?
The spirit of SpaceML.org is open science in the classical practice of scientific discourse and we encourage all users of SpaceML to publish scientific results as scientific papers, tech memos or branched code on GitHub or GitLab and/or open source your work under under GNU General Public Licence v3.0 (free, copyleft licence for software and other kinds of work).
In the case of data products, SpaceML.org data is covered by a blanket open data license requested by our space agency partners:
https://opendatacommons.org/licenses/
If a third party or researcher team invests in further development of the code or data, they can request that derivatives be re-licensed with more permissive licenses such as MIT or BSD, upon request of FDL / the SETI Institute.
Please note that release of data products may need to be managed for ethical reasons.
2.2. Accredited Researchers: Requesting data for simulation, visualizations and Machine Learning pipelines.
2.2.1 What information will be recorded when I fill in the data request form?
We seek to request minimal personal information needed to become an accredited researcher able to download and upload data. We politely request the following:
Your full name
Your contact details
Organization/university you’re associated with
Name of your research supervisor and their contact details (if applicable). Names, Institution and email addresses of your team members who will be using this data.
2.2.2 Will this information be stored and for what duration?
Your information will be stored on our system for the duration of your research and the duration you have access to data. This can be removed on request once the project is completed.
If you produce another dataset through your research, we invite you to upload any related data or further enhancements (improved labels etc) to SpaceMl.org. We will continue to hold your personal information so that you can be duly credited for your work, and a DOI will be created for any new datasets that result from your ongoing work. Should your research end and you do not require access to data, your access will be revoked and your personal information will only be held for the period required in line with the UK//EU GDPR laws.
2.2.3 Why do you need my information for me to access this data?
We request personal information and institution information (“accreditation”) to ensure that we understand who is using the data and to securely manage the use and distribution of data and cumulative egress costs. Additionally, should the instance of missing citations come up, we need to be able to contact you to rectify any gaps.
2..2.4 Can I request for my information to be removed from your records?
Yes you may request for your information to be removed from our records, this can be done by contacting (info@fdl.ai) and filling out the request for information removal form (xxx). However, please note that you will no longer have access to the dataset you requested and you may be asked to remove those files from your storage. Additionally, should you need access to the data once again, you will be required to resubmit the data request form with your personal details which will need to be stored for our governance.
3. Data egress and ingress
3.1 How do I acknowledge/reference the creators of the dataset in my research?
The best way to acknowledge your colleagues and generators for data is by including a Data Object Identifier managed by the DOI Consortium (https://www.doi.org). This DOI is unique to each dataset and will contain information about the originator of the dataset.
3.2 Can I get help using DOI for citation?
If you are having trouble with DOI’s, please follow this link for troubleshooting(xxx)
3.3 I’m writing a new white paper/ conducting new research which will result in a new dataset that improves or enhances research on SpaceML. How should I register that new dataset?
We welcome improvements and derivatives of SpaceML data products and encourage you to upload data to SpaceML.org for the benefit of the research community. If you have a new dataset please follow the guide to register a new dataset in the SpaceML troubleshooting section.
If you would like support in producing associated materials, metadata, READMEs or Notebooks, please let us know - we’re here to help.
3.4 I’ve created a new dataset through my research. How do I get help writing a data management plan and how do I share it with fellow users? How will I be acknowledged for my dataset?
We encourage all research projects to establish a solid Data Management Plan to ensure effective continuity and reproducibility.
3.5 If you have trouble registering for data access or data access has been terminated, please contact info@fdl.ai for any troubleshooting queries.
3.6 Are there any fees for researchers using SpaceML.org?
Enhanced data and notebooks with data snippets are provided for accredited researchers for no associated fees. However server-side compute (i.e. continued use of GCP beyond the $500 free credits will require a GCP account and will incur associated fees directly with Google Cloud. SpaceML is not liable for any compute costs incurred by running SpaceML hosted workflows on the cloud or locally.
Datasets hosted on the SpaceML repo for egress to client (local) compute are provided without cost for accredited researchers. (See 2.2).
Similarly, for research teams wanting to upload enhanced or further harmonized or labeled data for long-term preservation and reproducibility on SpaceML.org, we absolutely welcome your contribution and provide this service for no fee for appropriate datasets below 1 TB (pending peer review) and will provide you with a DOI number. If your dataset is larger we may be able to accommodate, but consideration will be made on a case-by-case basis.
4. Data sharing policy
4.0 Once I have access to the dataset, can I share it with my colleagues?
You are able to share your dataset within your research or project team, however, you will need to provide the name, Institution and contact details of each team member that will have access to this data and we politely request that any egress (download) is performed only once.
Additionally all data is governed by the project data management plan (see below), and users must adhere to the strict use and distribution policy. Any instance of misuse will result in all data access being revoked and could result in an internal inquiry with the original data publisher.
5. Project Data management plan
We are currently developing an in depth data management plan which is being informed by the NASA Heliophysics Data Policy.https://hpde.gsfc.nasa.gov/Heliophysics_Data_Policy_v1.2_2016Oct04.html