How AI Is Deciphering Lost Scrolls From The Roman Empire
Researchers are using cutting-edge AI models to “read” ancient scrolls superheated by the eruption of Mount Vesuvius in 79, which covered much of the Bay of Naples in ash—including the now-famous towns of Pompeii and Herculaneum. Though the work to decode the scrolls began centuries before the artificial intelligence revolution emerged, myriad new technologies are making that work easier and faster than ever before.
As a term, “AI” is often as unwieldy as the technology itself, and thrown around in sweeping terms. What does it actually mean for AI to decode what has eluded humans for centuries? We spoke with experts working on the algorithms and models that are deciphering and cataloguing the classics to find out.
The disappearance and rediscovery of the scrolls
Nearly 2,000 years ago, the Gulf of Naples was rocked by the cataclysmic eruption of Mt. Vesuvius, which buried Pompeii and Herculaneum in ash. The towns were wiped off the map for over 1,500 years.
Flash forward to 1750, when workers digging a well discover marble flooring under the soil. Further excavations reveal a buried villa containing nearly 2,000 carbonized scrolls and charred papyrus fragments. At first, the scrolls are mistaken for fishing nets and charred logs; many are discarded or perhaps burned as torches. Eventually one of the scrolls is dropped and breaks, revealing the true nature of the blackened cylinders. According to the Getty Museum, the scrolls from the villa—now known as the Villa dei Papyri—constitute the only surviving library from the classical world.
Like the frescoes and casts of human remains in Pompeii and Herculaneum, the scrolls are extremely fragile, to the point of making them practically inscrutable. Successive attempts to painstakingly unwrap the scrolls caused many to fragment and disintegrate, losing the information so miraculously encased in them to time.
But among the scrolls that have been read are writings of the Greek philosopher Philodemus of Gadara, leading some researchers to believe the villa belonged to his patron—and father-in-law to Julius Caesar—Lucius Calpurnius Piso Caesoninus.
Today, over 300 unopened scrolls remain, mercifully sparing the early, crude attempts at revealing their contents.
The Vesuvius Challenge: Modern technology means we don’t have to pulverize the papyri
The Vesuvius Challenge was launched in March 2023. It’s a project challenging members of the public to use AI to identify characters, and ultimately words, hidden in the Herculaneum scrolls. The first word found and translated from one of the unopened papyrus scrolls (“purple”) was announced in October 2023. The finder of the word won $40,000 for his efforts, as part of the $1,000,000 paid out last year to people working on the lost library.
Machine learning and computer vision are the two types of artificial intelligence used in the challenge’s virtual unwrapping method. Machine learning uses data and algorithms to allow AI systems to imitate human learning, enabling them to become more accurate over time. Computer vision is exactly what it sounds like: a field of research that enables computers to identify objects and people, and ultimately enable the machines to think through what they’re seeing.
“The new computer vision techniques aimed at virtually unwrapping the unopened Herculaneum papyri are providing new hope for Herculaneum papyrology, enabling the reading of rolls that were last read almost two thousand years ago before the eruption of Mount Vesuvius,” said Federica Nicolardi, a papyrologist at the University of Naples Federico II and member of the Vesuvius Challenge’s papyrology team, in an email to Gizmodo.
A team including some of the Vesuvius Challenge members gave the technology a trial run in 2015 using a scroll from En-Gedi; that work involved taking a three-dimensional, volumetric scan of the scroll, revealing its 3D structure. Then, computer software made sense of each layer wrapped within the scroll and the brighter pixels in the scan that represent ink still left on the surface. Finally, the scroll was virtually “unwrapped” and the digital version of the document was laid out in a readable way.
The Vesuvius Challenge’s 2024 goal is for 90% of the team’s scanned scrolls to be read. There are cash prizes for deciphering the first letters in certain scrolls as well as a larger prize for automated segmentation of one of the scrolls. If translated, it will be the first time the scrolls are read since they were buried in ash.
Why do researchers need AI to read the scrolls?
“The big problem in working with ancient texts is the state of preservation of these text is often fragmentary,” said Thea Sommerschield, a classicist at the University of Nottingham who is not a member of the Vesuvius Challenge, in a call with Gizmodo. “Machine learning is extremely good at identifying patterns, let’s say textual patterns, and harnessing those to carry out certain tasks.”
In the classics, AI is speeding up and scaling up processes previously painstakingly done by humans. In the case of the Herculaneum papyri, those tasks come in a few forms.
“The contestants figured out how to identify regions within the closed scroll that probably were ink and then they incrementally built up a label set that allowed them to elicit the ink using a convolutional neural network, and then ultimately a transformer-style network,” said Brent Seales, a computer scientist at the University of Kentucky and principal investigator of the Educe Lab, in a phone call with Gizmodo.
Simply put, a convolutional neural network is a set of machine learning models that relies on deep learning for tasks. Convolutional neural networks are especially useful for classification and computer vision-based tasks, hence its utility in handling the faint vestiges of ink on carbonized papyrus.
“You can think about the approach as kind of a pointillist approach,” Seales said. “We’re looking at very small sub-volumes on the surface, and we’re making a decision about whether that small piece is ink or not.”
Transformers are a newer AI technology that enable models to handle huge strings of text and handling multiple streams of data better. Such “multi-modal” AI systems are what make it possible for AI to generate images from text inputs, or combine computer vision with natural language processing to read an image of a handwritten letter. (If you didn’t know, the ‘T’ in “ChatGPT” stands for Transformer.)
“Transformers are the state of the art in computer science right now because of their unparalleled ability to capture context,” Sommerschield said, which is “useful in restoring ancient fragmentary texts” as well as dating them and predicting where they were written.
Computer vision isn’t the only AI field at work in the classics
The Vesuvius Challenge is just one approach researchers are taking to deploy AI in the study of ancient texts.
In 2019, Sommerschield and her project co-lead Yannis Assael, a research scientist at Google DeepMind, developed the Pythia model, a neural network that was state-of-the-art at the time, designed to restore ancient Greek texts. Pythia did that by recovering characters from damaged texts; Pythia had a character error rate of 30.1%, compared the 57.3% error rate of human epigraphists.
Since then, Sommerschield and Assael’s team published the more powerful transformer-based Ithaca model, which uses neural networks to restore and attribute ancient texts. As the team wrote in their work, Ithaca is “designed to assist and expand the historian’s workflow.” The model alone achieved 62% accuracy restoring damaged texts, the team found, but historians’ accuracy using Ithaca jumped from 25% to 72%. Ithaca and models like it “can unlock the cooperative potential between artificial intelligence and historians,” the team wrote.
In a 2024 paper in Computational Linguistics, their team published a sweeping survey of research on ancient texts using machine learning. They found growing momentum for that research, from digitization, restoration and attribution work to linguistic analysis, textual criticism, and translation.
However, the researchers also identified hurdles to overcome. Their data highlighted that different languages, histories, and geographies are represented in different proportions in existing research using machine learning on ancient texts. You may guess: Ancient Greek and Latin texts were represented much more heavily than other scripts, including cuneiform, Old Korean, and the Indus script. The work to ensure that all cultures are represented as researchers deploy machine learning on ancient texts is obviously the work of human researchers, not of the models themselves.
Keeping humans in the loop
Amid the hubbub about the Vesuvius Challenge, it’s easy to forget a key fact: AI itself is not reading the scrolls. That’s not to diminish the work of the team; if anything, it emphasizes it. The researchers are not leaning on AI where it doesn’t make sense to, or where doing so could yield inaccurate results about the scrolls’ contents.
“The AI framework is not making a decision about a complete letter form,” Seales said. It is simply highlighting where it perceives ink in the scrolls, which “reduces the possibility of hallucination.” In other words, it keeps the team’s model from mistaking an Eta for a Theta, scrambling the meaning encased in the papyrus.
“It’s the human who sees how all of those individual ink decisions line up and whether they make sense as writing or not,” he added.
“The moment that you start applying these technologies to ancient languages, you critically realize their drawbacks, their potential,” Sommerschield said. “The answer is just you need to you need to keep the human in the loop.”
There’s a lot of work still to be done
Earlier this month, Sommerschield and Assael organized the Machine Learning for Ancient Languages (ML4AL) Workshop to encourage collaboration and support the momentum of research in the field.
“You need the experts, or the students, or the practitioners, or the museum communities, or the general public to be involved, to benefit, to use it, to troubleshoot it, to break it, to try to really get the best out of it,” Sommerschield added.
For the Vesuvius Challenge, the next step is to build out a workflow for segmenting and scanning the scrolls at scale so that they can be read efficiently. There are about 300 extant scrolls for them to work on, and the documents need to be transported (with conservators as handlers) to a particle accelerator in England to be scanned. All told, the cost to scan all the scrolls today would be $30 million.
As for your burning question—what can we actually learn from these documents found in the shadow of Vesuvius? Nicolardi told Gizmodo that “we expect to find more philosophical works that can shed light on Greek philosophy, particularly books by Epicurus and his disciples, whose texts are completely lost outside of the library of the Villa dei Papiri.”
And that’s not all. About 1,100 scrolls were recovered from the Villa dei Papiri in 1752 and 1754, according to the Getty Museum. But the villa site is not completely excavated, and according to the project website, “it is a near-certainty” that more scrolls remain buried. Excavation is costly, though the team has plenty of scrolls to sift through before that moment comes along.
The scrolls are just one piece of this puzzle, though. The task at hand is to use AI to better understand the ancient world, and that means revisiting the documents familiar to us, too. While it’s exciting to imagine reading what hasn’t been read for two millennia, AI has implications across the classics. Sometimes, being able to take stock of something in a new way is just as useful as seeing it for the first time.
READ MORE HERE