Transparency is sorely lacking amid growing AI interest

May 10, 2024 TH Author

People inside bubbles — Andriy Onufriyenko/Getty Images

Transparency is still lacking around how foundation models are trained, and this gap can lead to increasing tension with users as more organizations look to adopt artificial intelligence (AI).

In Asia-Pacific, excluding China, IDC projects that spending on AI will grow 28.9% from $25.5 billion in 2022 to $90.7 billion by 2027. The research firm estimates that 81% of this spending will be directed toward predictive and interpretative AI applications.

Also: Five ways to use AI responsibly

So while there is much hype around generative AI, this AI segment will account for just 19% of the region’s AI expenditure, Chris Marshall, an IDC Asia-Pacific VP, posited. The research highlights a market that needs a broader approach to AI that spans beyond generative AI, Marshall said at the Intel AI Summit held in Singapore this week.

IDC noted, however, that 84% of Asia-Pacific organizations believe that tapping generative AI models will offer a significant competitive edge for their business. These enterprises hope to achieve gains in operational efficiencies and employee productivity, improve customer satisfaction, and develop new business models, the research firm added.

Also: The best AI chatbots: ChatGPT and other noteworthy alternatives

IDC also expects the majority of organizations in the region to increase edge IT spending in 2024, with 75% of enterprise data projected to be generated and processed at the edge by 2025, outside of traditional data centers and the cloud.

“To truly bring AI everywhere, the technologies used must provide accessibility, flexibility, and transparency to individuals, industries, and society at large,” Alexis Crowell, Intel’s Asia-Pacific Japan CTO, said in a statement. “As we witness increasing growth in AI investments, the next few years will be critical for markets to build out their AI maturity foundation in a responsible and thoughtful manner.”

Industry players and governments have often touted the importance of building trust and transparency in AI, and for consumers to know AI systems are “fair, explainable, and safe.” When ZDNET asked if there was currently sufficient transparency around how open large language models (LLMs) and foundation models were trained, however, Crowell said: “No, not enough.”

She pointed to a study by researchers from Stanford University, MIT, and Princeton who assessed the transparency of 10 major foundation models, in which the top-scoring platform only managed a score of 54%. “That’s a failing mark,” she said during a media briefing at the summit.

Also: Today’s AI boom will amplify social problems if we don’t act now

The mean score came in at just 37%, according to the study, which assessed the models based on 100 indicators, including processes involved in building the model, such as information about training data, the model’s architecture and risks, and policies that govern its use. The top scorer with 54% was Meta’s Llama 2, followed by BigScience’s Bloomz at 53%, and OpenAI’s GPT-4 at 48%.

“No major foundation model developer is close to providing adequate transparency, revealing a fundamental lack of transparency in the AI industry,” the researchers noted.

Transparency is necessary

Crowell expressed hope that this situation might change with the availability of benchmarks and organizations monitoring AI developments. She added that lawsuits, such as those brought on by The New York Times against OpenAI and Microsoft, could help bring further legal clarity.

There should be governance frameworks similar to data management legislations, including Europe’s GDPR (General Data Protection Regulation), so users know how their data is being used, she noted. Businesses need to make purchasing decisions based on how their data is captured and where it goes, she said, adding that growing tension from users demanding more transparency might fuel industry action.

As it is, 54% of AI users do not trust the data used to train AI systems, per a recent Salesforce survey, which polled almost 6,000 knowledge workers across the US, the UK, Ireland, Australia, France, Germany, India, Singapore, and Switzerland.

Also: AI and advanced applications are straining current technology infrastructures

Contrary to common belief, accuracy does not have to come at the expense of transparency, Crowell said, citing a research report led by Boston Consulting Group. The report looked at how black- and white-box AI models performed on almost 100 benchmark classification datasets, including pricing, medical diagnosis, bankruptcy prediction, and purchasing behavior. For nearly 70% of the datasets, black-box and white-box models produced similarly accurate results.

“In other words, more often than not, there was no tradeoff between accuracy and explainability,” the report said. “A more explainable model could be used without sacrificing accuracy.”

Getting full transparency, though, remains challenging, Marshall said. He noted that discussions about AI explainability were once bustling, but had since died down because it is a difficult issue to address.

Also: 5 ways to prepare for the impact of generative AI on the IT profession

Organizations behind major foundation models may not be willing to be forthcoming about their training data due to concerns about getting sued, according to Laurence Liew, director of AI innovation for AI Singapore (AISG). He added that being selective about training data could also impact AI accuracy rates. Liew explained that AISG chose not to use certain datasets due to the potential issues with using all publicly available ones with its own LLM initiative, SEA-LION (Southeast Asian Languages in One Network).

As a result, the open-source architecture is not as accurate as some major LLMs in the market today, he said. “It’s a fine balance,” he noted, adding that achieving a high accuracy rate would mean adopting an open approach to using any data available. Choosing the “ethical” path and not touching certain datasets will mean a lower accuracy rate from those achieved by commercial players, he said.

While Singapore has chosen a high ethical bar with SEA-LION, it still is often challenged by users who call for tapping more datasets to improve the LLM’s accuracy, Liew said.

A group of authors and publishers in Singapore last month expressed concerns about the possibility their work may be used to train SEA-LION. Among their grievances is the apparent lack of commitment to “pay fair compensation” for the use of their writings. They also noted the need for clarity and explicit acknowledgement that the country’s intellectual property and copyright laws, and existing contractual arrangements, will be upheld in creating and training LLMs.

Being transparent about open source

Such recognition should also extend into open-source frameworks on which AI applications may be developed, according to Red Hat CEO Matt Hicks.

Models are trained off large volumes of data provided by people with copyrights and using these AI systems responsibly means adhering to the licenses that they use, Hicks said during a virtual media briefing this week on the back of Red Hat Summit 2024.

Also: Want to work in AI? How to pivot your career in 5 steps

This is pertinent for open-source models that may have varying licensing variants, including copyleft licenses such as GPL and permissive licenses such as Apache.

He underscored the importance of transparency and taking responsibility for understanding the data models and handling of outputs the models generate. For both the safety and security of AI architectures, it is necessary to ensure the models are protected against malicious exploits.

Red Hat is looking to help its customers with such efforts through a host of tools, including the Red Hat Enterprise Linux AI (RHEL AI), which it unveiled at the summit. The product comprises four components, including the Open Granite language and code models from the InstructLab community, which are supported and indemnified by Red Hat.

The approach addresses challenges organizations often face in their AI deployment, including managing the application and model lifecycle, the open-source vendor said.

“[RHEL AI] creates a foundation model platform for bringing open source-licensed GenAI models into the enterprise,” Red Hat said. “With InstructLab alignment tools, Granite models, and RHEL AI, Red Hat aims to apply the benefits of true open-source projects — freely accessible and reusable, transparent, and open to contributions — to GenAI in an effort to remove these obstacles.”

Transparency is necessary

Being transparent about open source

Artificial Intelligence