DARPA suggests turning old C code automatically into Rust – using AI, of course

To accelerate the transition to memory safe programming languages, the US Defense Advanced Research Projects Agency (DARPA) is driving the development of TRACTOR, a programmatic code conversion vehicle.

The term stands for TRanslating All C TO Rust. It’s a DARPA project that aims to develop machine-learning tools that can automate the conversion of legacy C code into Rust.

The reason to do so is memory safety. Memory safety bugs, such buffer overflows, account for the majority of major vulnerabilities in large codebases. And DARPA’s hope is that AI models can help with the programming language translation, in order to make software more secure.

“You can go to any of the LLM websites, start chatting with one of the AI chatbots, and all you need to say is ‘here’s some C code, please translate it to safe idiomatic Rust code,’ cut, paste, and something comes out, and it’s often very good, but not always,” said Dan Wallach, DARPA program manager for TRACTOR, in a statement.

“The research challenge is to dramatically improve the automated translation from C to Rust, particularly for program constructs with the most relevance.”

For the past few years, tech giants including Google and Microsoft have been publicizing the problems caused by memory safety bugs and promoting the use of languages other than C and C++ that don’t require such manual memory management.

The software engineering community has reached a consensus. Relying on bug-finding tools is not enough

The private-sector messaging has got the attention of the public sector, home to lots of legacy code, and has helped lead the White House and the US Cybersecurity and Infrastructure Security Agency (CISA) to encourage the use of memory safe programming languages – principally Rust, but also C#, Go, Java, Python, and Swift.

Those involved with the oversight of C and C++ have pushed back, arguing that proper adherence to ISO standards and diligent application of testing tools can achieve comparable results without reinventing everything in Rust.

But DARPA’s characterization of the situation suggests the verdict on C and C++ has already been rendered.

“After more than two decades of grappling with memory safety issues in C and C++, the software engineering community has reached a consensus,” the research agency said, pointing to the Office of the National Cyber Director’s call to do more to make software more secure. “Relying on bug-finding tools is not enough.”

Rust, which had its initial stable release in 2015, more than forty years after the debut of C, has memory safety baked in while also being suitable for low-level, performance-sensitive systems programming.

The programming language’s characteristics and popularity have led to initiatives such as Prossimo – the non-profit Internet Research Group’s effort to rewrite critical libraries and code, including the Network Time Protocol (NTP) daemon, in Rust (ntpd-rs) – as a way to reduce security risks.

“The large amount of C code running in today’s internet infrastructure makes the use of translation tools attractive,” Josh Aas, executive director of the Prossimo project, told The Register on Thursday.

“We’ve experimented with that, such as in our recent translation of a C-based AV1 implementation to Rust. The current generation of tools still require quite a bit of manual work to make the results correct and idiomatic, but we’re hopeful that with further investments we can make them significantly more efficient.”

Peter Morales, CEO of Code Metal, a company that just raised $16.5 million to focus on transpiling code for edge hardware, told The Register the DARPA project is promising and well-timed.

“I think [TRACTOR] is very sound in terms of the viability of getting there and I think it will have a pretty big impact in the cybersecurity space where memory safety is already a pretty big conversation,” he said.

Asked about DARPA’s suggestion that the software community has reached a consensus about the need to address memory safety, Morales wasn’t ready to write-off C and C++ completely.

“I think all languages are about trade-offs, but certainly at the kernel-level it makes sense to move part of the code to Rust,” he said.

Certainly at the kernel-level it makes sense to move part of the code to Rust

As to the possibility of automatic code conversion, Morales said, “It’s definitely a DARPA-hard problem.” The number of edge cases that come up when trying to formulate rules for converting statements in different languages is daunting, he said.

Wallach, who’s overseeing the TRACTOR project, told The Register the goal is to achieve a high level of automation, which will require overcoming some tricky technical challenges.

“For example, LLMs can give surprisingly good answers when you ask them to translate code, but they also can hallucinate incorrect answers,” he explained. “Another challenge is that C allows code to do things with pointers, including arithmetic, which Rust forbids. Bridging that gap requires more than just transliterating from C to Rust.”

Asked whether DARPA has any particular codebases in mind for conversion, Wallach said, “I’d point to the large world of open source code, and just as well, all the code used across the defense industrial base. I don’t have any specific plans, although some things like the Linux kernel are explicitly out of scope, because they’ve got technical issues where Rust wouldn’t fit.”

DARPA will hold an event for those planning to submit proposals for the TRACTOR project on August 26, 2024, which can be attended in person or remotely. Those who would do so, however, are required to register by August 19. ®

READ MORE HERE