The use of artificial intelligence and geospatial data is crucial in identifying greenhouse gas emission hotspots, monitoring climate risks, and proposing adaptation strategies to combat global warming and move towards a sustainable future. Satellites capture vast amounts of data every day, presenting a challenge in processing and harmonizing this data due to variations in formats, resolutions, and reference systems. IBM Research has developed the Tensor Lake House (TLH), a prototype tool that integrates advanced techniques to streamline the analysis of geospatial data and enhance data management efficiency.
TLH minimizes data access through indexes, optimizes block size and data layout on disk, and excels in handling scientific data like images and climate data. It organizes data into hyperdimensional cubes and allows queries to extract subcubes, speeding up data retrieval significantly. TLH also supports cloud-native infrastructures and grid computing for parallel data ingestion, reducing data ingestion times substantially.
The TLH prototype can cater to multimodal data and generalizes to data federation across centers, making data accessible through a unified API endpoint. IBM has open-sourced TLH to promote open science principles and innovation in geospatial research, creating a community around TLH to drive advancements in AI for climate and sustainability. TLH also supports emerging Open Geospatial Consortium standards and can be installed on any cloud infrastructure with ease.
In exploring the future of geospatial data management, there is a focus on embeddings generated by large-scale base models. These embeddings represent semantic information and can be shared between data centers to reduce data transfer energy, latency, and storage. Projects like Embed2Scale aim to generate compressed embeddings from satellite images for efficient data sharing and downstream tasks. The use of neural compression and model-based image retrieval shows promising results in reducing transfer latency and improving the accuracy of content-based image retrieval.
Overall, the sharing of fundamental model embeddings is expected to increase democratize access to Earth observation data, especially in bandwidth-limited locations. The advancements in AI and geospatial data management are crucial in addressing climate change and moving towards a more sustainable future.
Article Source
https://research.ibm.com/blog/ibm-geospatial-tensorlakehouse