Entropy, Speech and the Contents of Wittgenstein’s Pocket

From The Philosopher, Volume CXIII No. 2 Autumn 2025

Boltzmann—the physicist who captured entropy for Wittgenstein—the philosopher who grasped how it shapes language

Entropy, Speech and the Contents of Wittgenstein’s Pocket

By Elizabeth Rohwer


In my golden years, I changed tack—from a speech processing engineer, preoccupied with the computational extraction of meaning from speech, to an armchair philosopher, obsessed with explaining Wittgenstein’s philosophy of logic and language from a modern, 21st-century perspective. My goal is to expose the unusual yet profound connection between the 20th-century philosophical debate about how language works—redefined by Wittgenstein’s Tractatus-Logico-Philosophicus—and the 19th-century scientific discoveries about how steam engines work.

It was in the 19th-century that steam engineers learned how to control entropy—familiar to us from the second law of thermodynamics. By minimizing heat loss, they were able to increase the efficiency of their engines. These entropy controlling techniques sent steam-engines roaring and rolling across the world, driving the Industrial Revolution. A century and a half later, the clever manipulation of entropy once again is revolutionizing progress, this time in the domain of artificial intelligence—enabling us to build more efficient, or, as I will later explain, less ignorant bots. The critical link between these two seemingly unrelated realms is revealed in the strategies for cracking the mysteries of how we understand speech—a field that has been central to my own journey.

While entropy has traditionally been described as a measure of disorder or randomness in a system, a more modern interpretation sees it as the ignorance of the subject—a view that is gaining growing acceptance. Entropy has baffled scientists since 1824, when the French engineer Sadi Carnot, aiming to improve the efficiency of steam-engines, discovered that an ideal engine could never be built. Something, he realised, always stood in the way of fully converting internal heat into external work. A fraction of the energy was always missing, as if stealthily stolen in accordance with an iron rule imposed by nature.

In 1872, the German theoretical physicist Ludwig Boltzmann devised a formula that mathematically captured the offender, quantifying how much energy would be inadvertently lost. Years later, the implications of his statistical calculation triggered a paradigm shift in physics, establishing probability as central to our understanding of the natural world. No wonder, today, Boltzmann’s formula is engraved in gold on his tombstone, symbolizing the prevailing probabilistic worldview—one that embraces the irreversibility of time as a fundamental feature of reality. Speech, too, unfolds along this one-way arrow: what is said cannot be unsaid. Meaning, like energy, disperses into the world—shaped by uncertainty, carried forward by time, and never quite taken back.

Seven years before Boltzmann gave mathematical expression to this elusive phenomenon, another German physicist, Rudolf Clausius, had given the wrongdoer a name: entropy. The term implies change, with the Greek root trope meaning transformation, and the prefix en signaling a link to energy. Yet despite its etymological ties to energy and transformation, entropy came to signify something far more unsettling. While energy evokes action and potential, entropy has come to imply loss and disorder—and, more recently, lack and uncertainty, or the ignorance of the subject. It carries negative connotations not because of what it is, but because of what it withholds from our grasp.

It defies our intuition, shaped by daily encounters with real, tangible things—things we can describe, debate, talk about, agree upon, or reject. Entropy is none of these. It is something missing—elusive and intangible—yet, strangely, it can be measured and controlled. This gives rise to a deep philosophical paradox: how can something absent exert perceptible influence? How can absence be quantified and put to good use? That is precisely what steam engineers once did—and what computer scientists are still doing today, in their attempts to tame the same inexplicable force.

This tension between presence and absence, between what can be measured, digitized, and brought under our control, and what escapes articulation, lies at the heart of entropy’s enigma—and is echoed, strikingly, in Wittgenstein’s work. Entropy manifests itself in the conversion of internal heat into external work, and in the articulation of internal thought into external speech. Wittgenstein seemed to recognize this connection. His insistence that the most important things cannot be said, only shown, parallels the way entropy resists direct description—operating beneath the surface, shaping outcomes without ever fully revealing itself.

However, today we understand why it behaves this way. Entropy is not a classical mathematical function in the traditional sense, but a function of a probability distribution—a concept which is harder to grasp and far less intuitively transparent. Its implications and philosophical significance only became clear at the turn of the 21st century.

In language, entropy reveals itself in the logically ordered sequences of the words we choose to communicate meaning: positively, through the order imposed by the rules of grammar; and negatively, through the constraints those same rules imperceptibly foist on us by keeping countless word combinations out of our everyday language use. These two modes of manifestation define the twin limits—positive and negative—of what can be meaningfully said. Thus, entropy affects meaning by, paradoxically, both allowing and disallowing what can be meaningfully said. And because entropy itself is expressed through probabilities, it becomes clear why probability plays a foundational role in the formation of language and in how we learn it and use it.

This brings us to Wittgenstein. The idea of duality—i.e., that the limits of language, and of thought, are doubly defined, from the positive side and from the negative side—can be glimpsed in the Tractatus. So too can the role that probability plays in establishing those limits. It is no coincidence that the longest section of Wittgenstein’s book—section five—is devoted to probability theory. The significance of this has often gone unrecognised, contributing to the persistent difficulty in interpreting the work.

Wittgenstein initially called the rules that organise language logical syntax but later broadened the notion to grammar—including the constraints imposed on our actions in everyday life, where countless moves are simply impossible to make. Thus, the rules and exclusions of grammar are the organising two-way force behind our forms of life and language games, two terms that Wittgenstein uses in his later philosophy. Through these, entropy renders our daily existence both predictable—by establishing rules—and open to sudden turns and surprises, leaving space for the expression of our individualities and capacities for free will in all their multiplicity.

However, it took me a lifetime to grasp that the constraints imposed by grammar carry a subjective dimension—something perhaps more readily noticed by a non-native speaker like myself. They depend on language proficiency and on how well one is informed—or, conversely, how ignorant one is—about the topic one is trying to articulate or the activity one seeks to undertake.

My first encounter with the entropy puzzle goes back to a time when access to a mainframe computer was a rare privilege, punchcard machines were the standard, and the algorithms of modern AI had yet to be discovered. As a visiting PhD student at the Institute of Mathematics of the Siberian branch of the then Soviet Academy of Sciences, I was researching the role of entropy in speech recognition. We students were granted precious computer time during the quiet hours before dawn. Many a morning, at dusk, I walked through a snowy Siberian forest from my hotel to the Computer Center, contemplating the power of entropy. By taking entropy into account, we managed to improve the performance of our speech recognition system; we extracted more information that had previously seemed absent. And that was deeply unsettling.

Decades later, these early techniques led to one of the first dictation systems: Dragon Dictate. This transcribed natural language into text with enough accuracy to be commercially viable. Its creators gathered statistical data about language use—how often certain sounds appear in a word, which combinations of sounds and words occur frequently, and which are less likely to appear. This early large-scale use of probabilities produced a practical breakthrough: the successful manipulation of entropy in the linguistic domain. The mid-1990s also brought voice-driven telephony applications, automated call centers, and personal assistants—early signs that language, once the hallmark of human thought, could be statistically modeled, computationally predicted, and algorithmically shaped.
The mistaken belief that probability is a number that reflects an objective feature of the world, for example, that the probability for a die roll is one-sixth or that for a coin flip is one-half, rather than being a number defining a subjective boundary of a statistical distribution, lies at the root of many classical philosophical problems. That these philosophical problems “have in essentials been finally solved,” as Wittgenstein stated in the preface to the Tractatus, has not received the sustained attention it deserves. Why did Wittgenstein make that claim? And why did he include the theory of probability in his book? It is little wonder that the Tractatus has remained opaque: the probabilistic strand of logic and language that Wittgenstein was the first to explore has only recently been understood. Let me try to remedy that.

The technological advances of the 21st century have enabled the use of huge amounts of statistical data and vastly more powerful computing techniques. Big Data, Deep Learning Neural Networks, Large Language Models and the prodigy ChatGPT—the first widely adopted AI chatbot— are all offspring of the entropy-harnessing algorithms that were bumping around in my head fifty years ago, while the Siberian snow crunched rhythmically under my feet and the sun slowly rose over the tops of the pines. Those algorithms became my passport to the West, where I came to understand first-hand how entropy shapes language.

The experience brought an added benefit: it helped me recognise the central role that modern probability theory plays in Wittgenstein’s philosophy—and how it shaped his thinking. Reading him through a probabilistic lens reveals a coherence and consistency in what had long seemed obscure, fragmented, or contradictory.

Part of the riddle of entropy is that it cannot be pinned down, pointed at, or put into words. “This is it. This thing here is entropy. It is such-and-such...” That kind of naming eludes us. Entropy resists being captured in language, and yet it quietly manifests itself—shaping both the acceptable outcomes of our individual actions and speech, and their subjective source—us. To my amazement, I discovered a metaphor, originating with Wittgenstein, that aligned with my understanding. So bear with me, while I next try to elaborate on it and expand it to capture entropy’s dual objective and subjective manifestation—an elusive force that, in both nature and language, is measured through probabilities.

In Some Remarks on Logical Form (1929)—the only academic paper Wittgenstein considered worthy of publication during his lifetime—he writes:
“As I could describe the contents of my pocket by saying: ‘It contains a penny, a shilling, two keys, and nothing else. This ‘and nothing else’ is the supplementary statement which completes the description.”
Entropy can be envisaged as the padding that holds things together, making their container whole and complete. It’s like the bubble-wrap we use to fill the empty space in a parcel to protect its contents. With it, the parcel is more likely to arrive in a better condition, without the bubble-wrap, it may arrive damaged—the result is worse. Hence, Clausius’ deft choice of a name, which associates entropy with change.

But here's the rub: once something is given a name (blame Clausius) and once it is measured (blame Boltzmann), we naturally feel compelled to describe it (blame the philosophers). After all, that’s what we do with things that have names—we try to say what they are. But entropy resists this. We cannot describe it the way we describe tangible things, because its presence is concealed. It has no place in the world of small change, coins, and keys. Instead, it completes the list by standing as a generalisation of all innumerable things not present in Wittgenstein’s pocket. In doing so, it serves as a silent reminder of the possibilities excluded from the list of countable things.

Trained as an aeronautical engineer, Wittgenstein was well-versed in thermodynamics and had a keen intuition about the role entropy plays in Boltzmann’s statistical mechanics. He had acknowledged Boltzmann’s influence on his thinking, but the way entropy informs that thinking casts new light on the Tractatus’s final, enigmatic statement: “What we cannot speak about, we must pass over in silence.” What we can speak about—bubble-wrap, pockets, parcels—are concepts shaped by our repetitive language use. They emerge, over time, from our everyday experiences with things we can describe.

And yet, it is tempting to press beyond what can be clearly said—to see whether the ‘bubble-wrap’ conceals something else, another coin, a missing key, or perhaps a philosophically compelling turn of phrase. But venturing into that space armed with familiar words rarely yields clarity; more often, it leads to philosophical blather dressed up as insight. Hence Wittgenstein’s warning in his 1919 letter to his prospective publisher, Ludwig von Ficker:
“My work consists of two parts: the one that is here, and of everything which I have not written. And precisely this second part is the important one. For the Ethical is delimited from within, as it were, by my book, and I am convinced that, strictly speaking, it can ONLY be delimited in this way. In brief, I think: All of that which many are blathering today, I have defined in my book by remaining silent about it.” 
(Wittgenstein’s emphasis.) 
 
Wittgenstein’s seemingly entropy-inspired description of the Tractatus, and his peculiar emphasis on the book’s ethical point, begin to make sense in light of modern understandings of entropy—spurred by the work of the American mathematician Claude Shannon in the late 1940s. Until then, the dominant interpretation stemmed from Boltzmann’s statistical thermodynamics, where entropy was understood as a measure of disorder in the distribution of gas molecules within a container. But, in his effort to improve radio signal transmission, Shannon independently rediscovered Boltzmann’s formula and, to his surprise, found that it measured not just physical disorder but something new and just as elusive: the absence of information—teeming with unknown possibilities that haunt every act of communication.

Metaphorically speaking, Shannon had uncovered the universality of bubble-wrap: it can pad any kind of parcel. In his case, entropy revealed its role in language communication. It cushions the information transmission channel between speakers and listeners as they exchange meaning through grammatically ordered strings of words. Speaking is not merely a daily occurrence—it is a necessity. Without our language games, the forms of life shared by speakers of the same language could not exist. And, yet our linguistic exchanges are inherently riddled with uncertainty. Today, the probabilistic connection between entropy and meaning is well established. Meaning takes shape through countless repetitions; it crystallizes in the statistical patterns of language use.

But when Shannon asked mathematician John von Neumann for advice on what to name his discovery, he replied:
“You should call it entropy for two reasons: first, because that is what the formula is called in statistical mechanics; but second, and more importantly, as nobody knows what entropy is, whenever you use the term you will always be at an advantage!”
At the time, von Neumann was exploring the role of entropy in quantum mechanics. Half a century later, physicist Freeman Dyson put his finger on why entropy is so difficult to talk about. In his essay, Why Maxwell’s Theory Is So Hard to Understand (1990) he writes:
“All the concepts that appear in our language are classical [originate from our experiences with the physical objects that obey the laws of classical Newtonian mechanics]. Each of the interpretations of quantum mechanics is an attempt to describe quantum mechanics in a language that lacks the appropriate concepts. The battles between the rival interpretations continue unabated and no end is in sight.”
And so, it should come as no surprise that similar battles persist with the interpretations of the Tractatus. How could Wittgenstein scholars heed his warning—whereof one cannot speak, thereof one must be silent—when they are themselves grappling with the very limitations of language that he was struggling to delineate?

The important point, though, is that Shannon’s work on the missing information in a communication channel laid the foundations of information theory, which underpins the operation of modern computers. Within this field, Boltzmann’s entropy becomes known as information entropy. Much like Wittgenstein’s ‘and nothing else’ which completes the description of the contents of his pocket, and like the universal bubble-wrap cushioning all parcels, information entropy fills the empty, continuous space in which information resides.

However, the enigma of entropy was far from settled. In the 1960s, American physicist E. T. Jaynes, considered one of the founders of modern AI, made a striking discovery: entropy depends on the subject. Remarkably—because the problem was so deeply entrenched and conceptually difficult—it would take Jaynes another forty years to demonstrate entropy’s anthropomorphic character.

Jaynes’ breakthrough cemented the insights of Boltzmann and Shannon by establishing the mathematical link between them. In his book Probability Theory - the Logic of Science (2004) the role of the agent’s common sense—understood as plausible, or statistical, logic—takes center stage, shedding light on why Wittgenstein needed to include probability theory in the Tractatus, a book that aims to lay bare the logical structure of language and thought.

Today, entropy is understood more broadly as uncertainty that permeates daily life, reflecting a lack of knowledge—or ignorance—on the part of the subject. An AI chatbot is trained on statistical data—samples of reality—until it becomes able to understand us and perform its task. The process mirrors our own learning: the bot reduces its ignorance gradually by tuning up its internal structure through trial and error one step at a time. Although we don’t fully understand how it modifies itself to improve, we are still able to train it. Entropy has taken on a new name, but its mystery persists.

Natural language exhibits entropy through the restrictions that fail to be imposed on us by its grammatical rules. The looseness of grammar—its gaps, ambiguities, and moments of linguistic mischief that Wittgenstein called grammatical jokes—reveals entropy’s presence just as much as its formal constraints do. These unspoken possibilities subtly guide us toward what can be discovered, understood, and ultimately expressed. They entice us to keep searching for and unwrapping precious new, bubble-wrapped, God-given gifts.

As for me, remaining mindful of entropy’s intangible presence—and striving to minimize its impact by lessening my own ignorance—has led to better moves in life. In this light, Wittgenstein’s remark that the Tractatus has an ethical point resonates with the modern interpretation of entropy and with a view shared by many Wittgenstein scholars: that his book has a therapeutic effect. Even if we do not fully understand it, the Tractatus helps refine our ability to make more accurate predictions about the world—an endeavor which, like minimising entropy’s negative impact, ultimately benefits everyone.



See also the linked essay: ‘Common Sense and Communication’

About the author

Elizabeth Rohwer was born in Bulgaria, completed part of her Ph.D. in the former Soviet Union, undertook research at the Centre for Speech Technology at the University of Edinburgh in the UK, and served for numerous years as a consultant for telecommunications companies in the US, implementing speech applications. She is keen on elucidating the coherence of the entire body of Wittgenstein’s work, leveraging her professional background as an engineer, and specialist in speech processing.

Address for correspondence: Elizabeth Rohwer, erohwer@san.rr.com

Comments