NASA and IBM Collaborating to Strengthen Scientific Research Using INDUS Large Language Models

NASA and IBM Collaborating to Strengthen Scientific Research Using INDUS Large Language Models



IBM and NASA have collaborated to develop a set of Great language models known as LLM INDUS which aim to enhance scientific research. These models provide researchers with improved access to vast amounts of specialized knowledge and assist in extracting relevant information from various data sources. The suite includes models of sentence encoders and transformers that support five specific science domains: Earth science, astrophysics, planetary science, heliophysics, and biological and physical sciences.

NASA has revealed that the INDUS encoder models were trained using a large corpus of 60 billion tokens that cover data from all five scientific domains. These models were utilized to enhance sentence transformer models based on approximately 268 million text pairs. The collaborative team of IMPACT-IBM designed the LLMs for retrieval, augmented generation, and various linguistic tasks, enabling INDUS to handle researcher queries, generate answers to questions, and retrieve relevant documents.

Using selected scientific corpora sourced from various data outlets, the team trained the INDUS models which are accessible on the hugging face platform. Kaylin Bugbee, the team leader for NASA’s Science Discovery Engine, highlighted the advantages of the INDUS models for current applications. She mentioned that large language models are rapidly transforming the search experience and that the Science Discovery Engine has integrated INDUS into its search interface. Initial findings have demonstrated that INDUS has enhanced the accuracy and relevance of search results.

In conclusion, the collaboration between IBM and NASA has led to the development of the LLM INDUS suite, aimed at improving scientific research by providing researchers with better access to specialized knowledge and facilitating the extraction of relevant information from diverse data sources. These models support five key science domains and have been designed for retrieval, augmented generation, and various linguistic tasks. The INDUS models have shown promising results in enhancing the search experience for NASA’s Science Discovery Engine, indicating the potential value of these models in scientific research.

Article Source
https://executivebiz.com/2024/06/nasa-ibm-seek-to-enhance-scientific-research-with-indus-large-language-models/