IBM Breaks Ground by Open-Sourcing Granite Code Base: A Milestone in AI Development

14 May, 2024

Open-sourcing Large Language Models (LLMs) has long been a challenge, marred by legal complexities and proprietary concerns. While some companies have purportedly taken steps in this direction, IBM has now emerged as a pioneer, successfully navigating the intricate landscape of AI-compatible open-source initiatives. The culmination of years of effort, IBM's Granite Code Base represents a breakthrough in the field. Leveraging pretraining data from diverse, publicly available sources such as GitHub repositories and natural language datasets, IBM meticulously crafted a repository of LLMs devoid of copyright or legal entanglements. These models, boasting 3 to 4 terabytes of code data and natural language datasets, are licensed under the Apache 2.0 license, facilitating both research and commercial applications. This move, particularly enabling commercial use, sets IBM apart from its predecessors in the LLM arena.

Ruchir Puri, IBM Research chief scientist, underscores the transformative potential of these LLMs, heralding a new era in generative AI for software development. With a focus on targeted business use cases, IBM emphasizes the practical applications of the Granite models, specifically tailored for programming tasks across various domains. Already integrated into IBM Watsonx Code Assistant products, such as WCA for Ansible Lightspeed and WCA for IBM Z, the Granite LLMs offer a promising avenue for software modernization and automation. With accessibility heightened through platforms like IBM and Red Hat's InstructLab, developers worldwide can harness the power of these models to drive innovation.

Beyond its technical prowess, IBM's commitment to transparency and ethical data practices distinguishes its offering. By curating clean datasets free from hate speech and profanity, IBM ensures a conducive environment for development.