The Voynich Manuscript: Why Six Centuries of Cryptographers Have Failed

In a humidity-controlled vault in the Beinecke Rare Book and Manuscript Library at Yale, call number MS 408, lies a 240-page codex that has defeated every professional cryptographer who has attempted to read it since it surfaced in European collections in the early seventeenth century. Carbon-dated by the University of Arizona in 2009 to between 1404 and 1438, written in a script no other surviving document uses, illustrated with plants that do not match any known botanical tradition and astrological diagrams that resist identification, the Voynich Manuscript is the most famous undeciphered text in the world. It is also, despite a steady annual flow of sensational « solution » announcements, essentially no closer to being read than it was a century ago.

One of the roughly 170,000 glyphs in a text that has survived six centuries without giving up its meaning.

What the manuscript actually is

The physical object is a quarto-sized vellum codex of approximately 240 pages, though some pages fold out to larger sizes. Its text runs to roughly 170,000 glyphs drawn from an alphabet of around 25-30 distinct characters (estimates vary depending on how researchers treat ligatures and variants). The manuscript divides thematically into sections that scholars conventionally label Herbal, Astronomical, Balneological (bathing scenes with nude figures), Cosmological, Pharmaceutical, and Recipes. None of those labels are anything more than inferences from the illustrations – nobody has read a word of the actual text.

The carbon dating established the vellum was prepared in the early fifteenth century. The ink, analyzed in a separate 2009 study by McCrone Associates, is consistent with the period and shows no signs of modern forgery. Whoever wrote the Voynich, wrote it roughly six centuries ago.

The chain of ownership

The earliest documented owner was the Prague alchemist Georg Baresch in the early seventeenth century, who sent a sample to the Jesuit polymath Athanasius Kircher in Rome hoping he would decode it. Kircher could not. A letter from 1665, found tucked inside the manuscript itself, refers to a possibly apocryphal earlier provenance tracing the codex to Emperor Rudolf II, who supposedly paid 600 ducats for it – a very large sum at the time. From the Jesuit Collegio Romano the manuscript passed through various hands until the Polish-American antiquarian Wilfrid Voynich purchased it in 1912 from the Villa Mondragone, after which it took his name. Yale acquired it in 1969 as a donation from H. P. Kraus.

Why the professional codebreakers have failed

The twentieth century produced the most formidable cryptanalysts in history, and several of them turned their attention to the Voynich. William Friedman, who broke the Japanese PURPLE cipher, spent decades on it with a group of colleagues and concluded only that the text displayed statistical properties inconsistent with a simple substitution cipher. John Tiltman, a senior British cryptographer who worked on ENIGMA-era German ciphers, reached the same conclusion from a different angle. Elizabeth Friedman, Prescott Currier, and various NSA analysts in the postwar period added detail without adding a solution.

What these analyses established is a cluster of statistical features that make the Voynich text unusual in ways a straightforward cipher does not explain:

The text follows Zipf’s law – the frequency distribution of its « words » matches natural languages rather than random gibberish.
It exhibits a readable word-like structure with apparent prefixes, roots, and suffixes.
Letters within words cluster in non-random positions, consistent with phonotactic constraints of a real language.
Yet the text has almost no common short words that function like « the » or « of, » and word repetition within lines is far higher than in any known natural language.
Currier’s analysis identified at least two distinct « hands » writing in statistically different dialects of the same script.

In other words, the text looks enough like a natural language to resist dismissal as nonsense, but differs from every natural language in enough ways to resist identification. This is the specific property that has made the Voynich a graveyard of decipherment attempts.

What modern computational linguistics has added

The last fifteen years have seen a shift from manual cryptanalysis to computational approaches, and while no complete decipherment has resulted, some findings have narrowed the field of possibilities.

A 2013 paper by Marcelo Montemurro and Damian Zanette in PLOS One applied information-theoretic measures and concluded that the Voynich text carries a meaningful linguistic signal consistent with natural language structure, which effectively ruled out the « elaborate hoax with no content » hypothesis that had been popular in the 1970s. A 2018 paper by Greg Kondrak at the University of Alberta proposed that the underlying language might be Hebrew, based on computational comparison; the claim received wide press coverage and was treated sceptically by most Voynich specialists.

More recently, researchers have applied large language models and neural network techniques to the problem. None have produced a decipherment that withstands scrutiny. The repeating pattern is instructive: a new computational method identifies suggestive statistical similarities to language X, a press release circulates, specialists examine the proposed decoding of actual passages, and the proposed decoding turns out to produce nonsensical text when applied consistently rather than cherry-picked.

The « it is a hoax » hypothesis

One minority view, most prominently argued by the information scientist Gordon Rugg in a 2004 Cryptologia paper, holds that the Voynich could be an elaborate Renaissance hoax generated by a mechanical table of syllables – a technique Rugg demonstrated could produce text with Voynich-like statistical properties in a plausible timeframe. The hoax hypothesis is not mainstream, but it has never been conclusively disproved either. What argues against it is the internal consistency of the text’s linguistic structure across 240 pages, and the expense of preparing that much vellum for a prank.

The « natural language, unknown encoding » hypothesis

The majority view among Voynich specialists is that the text encodes a real message in an unknown language or a real language under an unusual encoding system. Candidates proposed over the years include Latin written in a custom alphabet, a heavily abbreviated form of medieval Latin or Italian, a Turkic language, Hebrew, proto-Romance dialects, and various constructed languages in the tradition of medieval Lullian wheel ciphers. Each candidate has a partisan or two. None has produced a consistent translation of the manuscript’s actual content.

Two decade-defining decoding claims, examined

Every few years a decoding claim surfaces that generates enough media attention to be worth separating from the mass of smaller proposals. Two in particular – Gordon Rugg’s 2004 hoax hypothesis and Gerard Cheshire’s 2019 proto-Romance proposal – illustrate the ways these claims tend to succeed in the press and fail in the literature.

Gordon Rugg and the Cardan grille hypothesis (2004)

Gordon Rugg, then at Keele University, published a paper in Cryptologia in July 2004 arguing that the Voynich could have been generated by a Renaissance-era text-generation device known as a Cardan grille: a stencil with holes cut in it that, when slid across a table of syllables, produces plausible-looking words. Rugg demonstrated by actually building such a grille and showing that the output reproduced several statistical properties of the Voynich text, including the Zipf-like frequency distribution and the characteristic internal word structure.

The strengths of Rugg’s case are that his mechanism is historically plausible – the Cardan grille was known in sixteenth-century Europe – and that his reproduction of Voynich-like statistics from a nonsense generator was genuine. The weaknesses are that Rugg did not reproduce every statistical feature, particularly the apparent differences between Currier’s « Hand A » and « Hand B » and the consistent distribution of word endings across sections of the manuscript, and that his hypothesis requires an exceptionally dedicated hoaxer willing to prepare 240 pages of expensive vellum for a prank with no surviving profit motive. Specialists treat the Rugg case as plausible but unproven, a standing challenge rather than a settled account.

Gerard Cheshire and the proto-Romance proposal (2019)

In May 2019, Gerard Cheshire, a research associate at the University of Bristol, published a paper in the journal Romance Studies claiming to have decoded the Voynich as a « proto-Romance » language used at the court of Castello Aragonese in the fifteenth century. The University of Bristol issued a press release; the story ran in major outlets including the BBC, The Times, and The Guardian. Within days, medievalists and Romance linguists had published rebuttals. Lisa Fagin Davis, executive director of the Medieval Academy of America, wrote a widely cited critique noting that Cheshire’s method allowed him to translate any Voynich word into any Romance-adjacent word by flexible substitution, producing sentences that were grammatically incoherent in any known Romance language and semantically disconnected from the accompanying illustrations.

The University of Bristol retracted its press release within a week, an unusually public walk-back. The episode is routinely cited in the Voynich literature as a case study in how decoding claims that satisfy one or two linguistic criteria can nonetheless fail the basic test of consistent, meaningful translation across multiple passages. Cheshire has continued to publish elaborations of his hypothesis; the specialist consensus has not shifted.

What the physical manuscript has told us

The most important addition to Voynich scholarship in the last two decades has come not from cryptanalysis but from materials science. The 2009 radiocarbon dating at the University of Arizona’s Accelerator Mass Spectrometry Laboratory, led by Greg Hodgins, sampled four separate folios and returned a 95% confidence interval of 1404 to 1438 for the vellum’s preparation. This was a decisive result: it eliminated the long-standing hypothesis that Wilfrid Voynich himself had forged the manuscript in the early twentieth century, and it ruled out the even older speculation that Roger Bacon had written it in the thirteenth century. The manuscript is authentically fifteenth-century.

The ink analysis conducted by McCrone Associates in 2009, working with the Beinecke, found that the iron-gall ink used throughout the text is chemically consistent with the early fifteenth century and shows no evidence of modern synthesis. The pigments used in the illustrations – atacamite, red lead, azurite, ultramarine – are also period-consistent. Crucially, the ink chemistry varies slightly between what Currier identified as the two hands, supporting his conclusion that multiple scribes worked on the manuscript.

Paleographic analysis of the script itself has been slower to produce results, partly because the script has no known cognates. What researchers including Rene Zandbergen have established is that the writing is fluent – the characters are formed with the confident, consistent speed of a practiced hand, not the hesitant stroke patterns of someone inventing a script on the fly. Whoever wrote the Voynich had been writing in this alphabet for long enough that it had become automatic. That observation, by itself, rules out the most casual versions of the hoax hypothesis.

The « sensation » cycle

The Voynich generates a steady supply of headlines promising a breakthrough. In the past decade alone major newspapers have reported solutions involving Hebrew anagrams, a proto-Romance « sola busca » dialect, a Turkic origin, an Aztec botanical codex transferred to Europe, and most recently, various AI-assisted decodings. The pattern is consistent: initial coverage is credulous, specialist review is sceptical, and within a few months the claim recedes without quite being retracted.

Scientific American, Nature, and the BBC have all run thoughtful post-mortems on these cycles, and the underlying structural issue is worth naming: a claim of partial decipherment is hard to disprove because the person making it can always point to passages that fit the interpretation. A claim of full decipherment, where every page reads consistently under the same decoding rules, is what specialists actually require. No such claim has ever been produced.

What the illustrations tell us (and do not)

The manuscript’s botanical illustrations are often cited as potential clues. Various researchers have attempted to match them to real plants; the matches are unconvincing in most cases. A 2014 paper by Arthur Tucker and Rexford Talbert argued that some illustrations resembled Mesoamerican plants, supporting a New World origin hypothesis. The carbon dating to 1404-1438 complicates this considerably – the manuscript predates Columbus by at least fifty-four years.

The astrological diagrams include a zodiac but with some unusual features. The balneological section, showing nude female figures in what appear to be interconnected baths or pipes, has been variously interpreted as anatomical, alchemical, or symbolic. None of these interpretations has been confirmed by the text because the text has not been read.

What we can honestly say in 2026

Six centuries after the vellum was prepared, the Voynich Manuscript’s status is essentially this: we know when it was made (early fifteenth century), we know its text carries some form of linguistic structure, and we do not know what it says. Every confident claim to the contrary has, so far, failed to survive specialist review. The manuscript remains available for free high-resolution study via the Beinecke digital collection, and its mystery is partly a function of how open it has remained to amateur and professional investigation alike.

The most likely outcome, in the view of most working cryptanalysts, is that the Voynich will eventually be read either through the discovery of a parallel text (a second document in the same script, or a contemporary key) or through advances in computational analysis applied with appropriate discipline. The less likely but not impossible outcome is that it remains unread indefinitely, as some ancient scripts did before the Rosetta Stone. What is effectively certain is that no lone amateur working without access to the specialist literature will produce the solution, though many will continue to claim they have.

The plant illustrations and the New World hypothesis

The botanical section of the manuscript, which takes up roughly 130 of the 240 pages, has generated some of the longest-running and least-settled arguments in Voynich scholarship. Roughly 113 plants are illustrated. Most have never been confidently identified with any real species. Several resemble plants only loosely; others appear to combine features of multiple plants in a single drawing, which has been variously interpreted as symbolic, mnemonic, or simply imaginative.

The 2014 paper by Arthur Tucker, a botanist then at Delaware State University, and Rexford Talbert, an information technology specialist, argued that at least six of the Voynich plants were identifiable as Mesoamerican species, including one interpretation of a New World agave, another of a cactus consistent with Opuntia, and a third of a sunflower (Helianthus annuus) native to North and Central America. Their conclusion, published in the journal HerbalGram, was that the manuscript was of sixteenth-century colonial origin, produced in New Spain and transmitted back to Europe.

The radiocarbon dating to 1404-1438 is the fundamental problem with this chronology, since Columbus’s first voyage was in 1492 and the Spanish conquest of Mexico began in 1519. The Tucker-Talbert response has been to argue that the carbon dating only establishes when the vellum was prepared, not when the text was written on it, and that vellum could have been stored for decades before use. This is technically possible but historically unusual; the premium on prepared vellum was high, and it was rarely stockpiled for more than a few years. Most Voynich specialists therefore continue to treat the New World hypothesis as interesting but unsupported by the weight of physical evidence.

What remains instructive about the Tucker-Talbert exchange is the methodological point it surfaces. Partial identification of a handful of plants, from a corpus of over a hundred, does not constitute decipherment or even consistent botanical sourcing. The same partiality has attended every attempt to match Voynich illustrations to real-world plants: a few suggestive matches against a much larger set that resists identification.

For further reading

The essential scholarly overview remains Rene Zandbergen’s voynich.nu, an exhaustive and sceptical reference site. Journalistic treatments worth consulting include the 2021 feature in The Guardian and the coverage at Smithsonian Magazine, both of which treat the regular decipherment announcements with appropriate caution. The original Montemurro and Zanette paper remains open-access on the PLOS website for readers interested in the technical case for linguistic structure.

For a broader context on the recurring failure of Voynich decipherment claims and what it tells us about the psychology of pattern-matching, our piece on why unsolved mysteries generate false solutions approaches the same problem from the other direction.

The Voynich does not belong to any single discipline, which is part of why no single discipline has solved it. It sits at the intersection of paleography, cryptography, linguistics, and the history of science – and probably will remain there until someone, working somewhere unexpected, notices what everyone else missed.