IBM’s research and development unit has released a blog discussing Domain-Specific Generative AI (gen AI) for Industry 4.0, focusing on a community-driven approach to developing language models. They have introduced a new technique called LAB for training enterprise-oriented large language models (LLMs) using taxonomy-driven synthetic data generation and have launched the open source project InstructLab with Red Hat to invite collaboration from the Industry 4.0 community. The project offers tools for generating synthetic data for chatbot tasks and aids in integrating new knowledge into the basic model without overwriting what the model has already learned. Watsonx gen AI platform and Granite core model series are used in conjunction with other AI models in the cloud to democratize LLM development.
The goal is to improve LLMs in less time and at a lower cost than traditional training methods. The LAB technique overcomes challenges around LLM training by using taxonomy-guided synthetic data generation. Companies can adapt and fine-tune models with their proprietary data to teach them the language of their business, leading to more personalized models trained on private data. The initiative aims to address the complexities of governance, tracking hallucinations, biases, and drift in LLMs, which must be periodically updated by specialist engineers.
The blog emphasizes the potential economic impact of generative AI in Industry 4.0, highlighting statistics showing that it could add $4 trillion to the economy, although only 10 percent of companies are currently implementing generative AI solutions. InstructLab seeks to bring developers together around the LAB technique to overcome challenges and enable the development of collaborative models in an open manner. By pooling R&D efforts in this initial phase, the aim is to increase the adoption rate of generative AI solutions and reduce the economic impact.
IBM Research conducted internal benchmark testing of the LAB approach in Granite LLM and found great performance in terms of consistency, accuracy, and engagement. Open source LLM Merlinite built on Mistral 7B also achieved strong scores with the InstructLab method. IBM and Red Hat have released selected open source licensed Granite language and code models under the Apache 2.0 license. The focus is on fostering an open innovation ecosystem around AI to enhance model flexibility and improvements for customers by offering a combination of third-party models, select open-source models, proprietary domain-specific models, and IBM open source InstructLab code and language models licensed by Red Hat.
Article Source
https://www.rcrwireless.com/20240620/ai-ml/industrial-ai/ibm-and-red-hat-eye-community-driven-gen-ai-for-industry-4-0