Boffins carve up C so code can be converted to Rust

Computer scientists affiliated with France’s Inria and Microsoft have devised a way to automatically turn a subset of C code into safe Rust code, in an effort to meet the growing demand for memory safety.

The C programming language was created in the early 1970s and has been used to build numerous critical systems, applications, and libraries. The Linux kernel, for example, is written mostly in C.

But C, like its extension C++, is not designed for memory safety. It features manual memory management, which is more efficient and flexible than automated memory management (e.g., garbage collection), but also more prone to memory errors.

Memory safety errors like out-of-bounds reads and writes and use-after-free represent the majority of software vulnerabilities in numerous recent studies. In Google’s Android operating system, they accounted for 76 percent of vulnerabilities in 2019, a figure that the Chocolate Factory projected would fall to 24 percent by the end of 2024 through the use of Rust and Safe Coding practices.

Rust code can be written to be either memory safe or unsafe, at the discretion of the developer. While C and C++ code can be made more memory safe through diligence, static analysis, and testing, neither programming language offers memory safety guarantees out of the box.

For the past few years, there’s been an industry and government-backed effort to reduce the use of C and C++ code and increase the use of memory safe programming languages like Rust, Go, Python, and Java (though these may link to unsafe libraries). As the Internet Security Research Group’s (ISRG) Prossimo Project puts it: “Using C and C++ is bad for society, bad for your reputation, and it’s bad for your customers.”

Not everyone would go quite that far. Many C and C++ programmers would rather find ways to keep using their favored tools without jumping on the Rust bandwagon, and even Google, among the more vocal Rust evangelists, has acknowledged that C and C++ code will be around for a long time.

So there’s been quite a bit of effort to develop ways to make C and C++ more memory safe and to develop automated code conversion mechanisms like DARPA’s TRACTOR program.

Efforts to make C memory safe like TrapC and Fil-C come with trade-offs. TrapC, still in development, focuses on a subset of the language. Fil-C currently imposes a performance hit and lacks full Application Binary Interface (ABI) compatibility.

In a preprint paper titled “Compiling C to Safe Rust, Formalized,” authors Aymeric Fromherz (Inria) and Jonathan Protzenko (Microsoft) provide an alternative to automatic C-to-Rust conversion schemes that produce unsafe Rust. Their focus is on providing a conversion path for industrial-grade code that’s been formally verified.

“C allows programmers to be creative with aliasing, low-level casts, memory management, and data representation,” the authors explain. “Expressing those patterns in Rust requires opting out of many static guarantees, in order to allow unchecked aliasing, casts between representations (a.k.a. ‘transmutation’ in Rust lingo), and so on, which is done through the unsafe feature of Rust. But doing so obliterates the benefits of Rust!”

So Fromherz and Protzenko have developed a subset of C, dubbed “Mini-C,” which avoids common C patterns and features like pointer arithmetic and implicit mutability that just can’t be translated directly to safe Rust.

Implementing Mini-C using the ​​KaRaMeL compiler framework, the boffins say their approach produces safe Rust code, with the caveat that some refactoring may be required.

“Instead of automatically translating C in its full generality to unsafe Rust and attempting to make the resulting code safer, we rather target a data-oriented, applicative subset of C,” they explain. “Our translation process is therefore semi-active: users may have to perform minimal adjustments to the source C program to fit the supported language subset; once in this subset, our approach then automatically produces valid, safe Rust code.”

They tested the conversion process on HACL* (High-Assurance Crypto Library), a verified cryptographic library consisting of 80,000 lines of C code, and on EverParse, a verified formatters and serializers library consisting of 1,400 lines of C.

The HACL* conversion required minimal code changes and the EverParse conversion worked without source alteration. And the results worked well, they claim – the Rust code exhibited the same performance profile as the original C code, despite the addition of fat pointers and runtime bounds checks.

This work is now helping to make a variety of dependent applications a bit more secure. The authors say their Rust-compiled HACL* has been packaged within the libcrux cryptographic library, parts of which have been added to Mozilla’s NSS and OpenSSH. ®

READ MORE HERE