hhhh
Newsletter
Magazine Store
Home

>>

Technology

>>

Artificial intelligence

>>

What is the Role of Data in Ge...

ARTIFICIAL INTELLIGENCE

What is the Role of Data in Generative AI?

What is the Role of Data in Generative AI?
The Silicon Review
24 May, 2024

In 2024, generative artificial intelligence has been with us for a decade. Yet it feels like we’re barely scratching the surface of what it can - and may be allowed - to do. Generative AI works by learning the patterns and structures of the training data input into the package and using those to generate new data that has similar characteristics. So, at a very basic level, data is intrinsic to the whole concept. If the source data is not up to snuff, the resulting content won’t be either; to reference back to a much older computing terminology, ‘garbage in, garbage out’. However, as computing power grows ever more massive (and ever more accessible) the harnessing of data for GenAI opens new frontiers to be explored. So, let’s take a look. 

The basics

Generative AI splits the concept into two different branches: foundational or Large Language Models (LLM) and retrieval-augmented generation (RAG). The difference between the two is fairly simple. Foundational models are trained by having massive amounts of data put into the algorithm, which the AI model can then fine-tune according to the parameters of the question it’s asked. Chatbots would be a good example of foundational models – there are only so many relevant questions an individual can pose to a company’s representative, so it’s less likely to encounter something that it has no reference for. 

RAG works on real-time searching, querying a database for information about the problem as it is being input. For DALL-E, the famous AI image generator, asking it to produce an image of Caleb Williams throwing a debut game touchdown in a Chicago Bears jersey on the opening day of the 2024 season may not be possible as it’s built on a foundational model. That package may struggle until the next data batch is input. By contrast (Bears fans hope), a RAG model could do that as soon as that code is input into the database it draws from. 

Applications

Arguably the world’s foremost consulting company, McKinsey & Company has been bullish about GenAI’s impact on the real-world economy. They’ve predicted an annual revenue gain of between of around $3.5tn between 63 use cases. However, as we’ve reported, packages like ChatGPT and Google Bard somewhat fall over when facets like legal responsibility and creative imagination need to be considered. The need for human oversight, particularly in the case of the former, is still paramount. 

When utilizing others’ creativity, however, GenAI is making a compelling case. 2024 saw the release of <em>Eno</em>. The documentary about musician Brian Eno is driven by a RAG model that pulls from a large archive of filmed performances and interviews, as well as around 500 hours of footage shot specifically for the project.  <em>Eno</em> promises to be a different movie each time it’s shown, with 52 quintillion versions theoretically possible. 

Challenges

In theory, if one had enough computing power, the entire internet could be used as a source for GenAI to pull from. However, as mentioned prior, the legalities around that concept are currently muddy. The Generative AI Copyright Disclosure Act was introduced to Congress by Rep. Adam Schiff in April 2024. Supported by creative bodies such as the Recording Industry Association of America and the Writers’ Guild of America, the bill concerns companies’ usage of copyrighted data to train their AI systems. 

While the bill has not yet passed through Congress, it’s proposed that penalties for falling foul should start at $5,000 per violation. Should it pass, it can be safely assumed that foundational models will take longer to assemble and update, as it’s checked for copyright, and RAG databases will need to be meticulously maintained to make sure they’re free of infringing data. 

Data is the keystone and lifeblood of Generative AI, however, we’re not quite at the stage where the age-old sci-fi trope of machines becoming ‘sentient’ is yet a reality. A Harvard Business Review survey in the latter part of 2023 found that only 37% of responding organizations agreed that they had ‘the right data foundation’ for Gen AI. As databases get more powerful and legalities are ironed out, expect its impact to keep growing.

NOMINATE YOUR COMPANY NOW AND GET 10% OFF