The Brain Has Its Own Language. Foundation Models Are Learning to Speak It.

Written by Philip Egger | Feb 27, 2026 10:01:01 AM

Consider what made ChatGPT remarkable. Not just its ability to answer questions, but the underlying idea: train a model on a vast, unlabeled corpus of human language, let it learn the patterns, rhythms, and structures of how we communicate, and then — with relatively little additional effort — adapt that understanding to almost any language task you care to name, from legal analysis to code generation to medical summarization. The foundation is the same; only the application changes.

Now ask a different question: what if you could do the same thing with the brain?

This is the promise of brain foundation models — a class of AI that learns the patterns of neural activity the way large language models learn the patterns of text. At IDUN Technologies, we're not just watching this field develop from the sidelines. We're building on it.

What Are Brain Foundation Models?

The analogy to language models is closer than it might seem, so it's worth making it precise.

A large language model like GPT is trained on enormous quantities of raw text. It is not told what words mean, or how grammar works, or what a sentence is for. Instead, it learns by trying to predict what comes next — and in doing so, it extracts something genuinely useful: a deep, flexible representation of language structure that can be adapted to almost any downstream task.

Brain foundation models work on the same principle, but the input is neural data — most commonly electroencephalography (EEG) or functional magnetic resonance imaging (fMRI) recordings — and the "text" being modeled is the electrical activity of the human brain. The model is trained on large, heterogeneous datasets of brain signals, often spanning thousands of hours and hundreds of subjects, without any labels. It learns the temporal rhythms, spatial patterns, and state-dependent dynamics of neural activity through self-supervised objectives: masking portions of a signal and learning to reconstruct them from context, much like BERT masks words and learns to fill them in.

The result — after this pretraining phase — is a model that has internalized something general about how brains work. Not your brain specifically, and not in any one particular cognitive state, but brain activity as a general human phenomenon. It has encoded a prior: a set of learned expectations about what neural signals tend to look like, how they evolve over time, and what patterns tend to go together.

Then comes the fine-tuning step. And this is where the LLM analogy earns its keep.

Just as you can fine-tune a foundation language model for legal writing using a relatively small corpus of legal documents, you can fine-tune a brain foundation model for a specific downstream task — sleep stage classification, fatigue detection, emotion recognition — using a relatively small amount of labeled neural data for that task. The heavy lifting has already been done. The model already "speaks" the language of the brain; now you're just teaching it a particular dialect.

Why This Is Harder Than It Sounds — and Why It Matters

Brains are not books. And EEG signals are not sentences.

Text is remarkably clean as data goes. Words are discrete. Sentences have grammar. The patterns that make language useful are stable across speakers, time periods, and contexts in ways that make large-scale pretraining tractable. Brain signals enjoy none of these luxuries.

EEG data is noisy — contaminated by movement artifacts, electrical interference, and the imperfect contact between electrode and skin. It is non-stationary, meaning the statistical properties of the signal shift over time, across sleep stages, cognitive states, and even moments of distraction. And it is profoundly personal. Your brain's characteristic patterns during light sleep look different from your neighbor's. The placement of electrodes changes what the signal looks like. The device you're using changes it further still.

This is what makes consumer-grade, wearable EEG especially challenging. A clinical EEG system uses dozens of precisely placed scalp electrodes, with conductive gel, applied by a trained technician. The IDUN Guardian, by contrast, uses a single differential channel recorded from inside the ear canal — comfortable enough to wear overnight, unobtrusive enough for daily use, but necessarily operating with far less spatial information than a lab setup. What you gain in usability, you sacrifice in signal richness.

The broader research community is actively building tools to close this gap. One noteworthy recent example is ZUNA, a 380-million-parameter model released in early 2026 by Zyphra that takes a different but complementary approach to the channel-count problem. Rather than fine-tuning a foundation model for a downstream classification task, ZUNA learns to reconstruct missing or corrupted EEG channels — effectively performing "superresolution" on sparse electrode data. Given only the signals from a subset of electrodes and their physical positions on the scalp, it can generate plausible time series for channels that were never recorded. Trained on approximately two million channel-hours spanning 208 public datasets, and using a novel four-dimensional positional encoding scheme that allows it to generalize to arbitrary electrode configurations, ZUNA substantially outperforms traditional spherical-spline interpolation — the standard geometric workaround used today — especially at high dropout rates. The model weights are released under an open Apache 2.0 license. What's significant from a wearable perspective is the direction of travel: tools like ZUNA represent the field's recognition that consumer devices will always operate with constrained spatial coverage, and that learned priors about EEG structure can compensate for what geometry alone cannot. A note of caution, though: like all generative models, ZUNA can hallucinate signals that look plausible but are incorrect — reconstructed channels should be treated as imputed estimates rather than ground-truth measurements, particularly in clinical contexts.

Getting a foundation model to work well in this constrained setting is not a given. But if it can be done, the implications are significant: a wearable device that millions of people might actually wear becomes a platform for passive, continuous insight into mental and physiological states that were previously accessible only in specialized clinical environments.

Proving the Concept: IDUN and Sigma Nova

Last year, we set out to test whether the promise of brain foundation models could be realized in the specific, demanding setting of consumer ear-EEG. We partnered with Sigma Nova — a French AI startup whose mission is to build specialized foundation models for scientific discovery — and with Thibaut Haslé, an EPFL master's student supervised jointly by IDUN, Sigma Nova, and EPFL's NeuroAI Lab.

The task was sleep stage classification. Sleep staging — automatically identifying which sleep stage a person is in at any given moment — is one of the most clinically valuable things a wearable could do. It is also one of the hardest. The standard reference, polysomnography (PSG), uses a full array of scalp electrodes alongside eye movement and muscle sensors to classify sleep into five stages: Wake, light sleep (N1), deeper light sleep (N2), deep slow-wave sleep (N3), and REM. Doing this from a single ear-mounted electrode is a fundamentally different problem.

The approach was straightforward in concept, demanding in practice. Thibaut fine-tuned CBraMod — an open-source EEG foundation model pretrained on more than 9,000 hours of EEG data — on single-channel ear-EEG recordings from the IDUN Guardian. The model had learned general temporal representations of brain activity; the task was to adapt those representations, using a modest amount of labeled data from real IDUN Guardian users, to the specific challenge of identifying sleep stages from an ear signal.

The results were striking.

The foundation model surpassed the state-of-the-art feature-based baseline after fine-tuning on data from just two subjects. With 45 to 52 subjects, performance stabilized at a Cohen's κ of approximately 0.60 to 0.67 — a standard measure of agreement that corrects for chance — with the best run reaching κ = 0.727. For context, the mean inter-scorer agreement between two human experts working from full clinical polysomnography is around κ = 0.81. Getting within 0.08 to 0.10 of that benchmark, from a single ear channel, is a meaningful result.

Two findings in particular stood out. First, subject diversity mattered more than data volume: adding more distinct users to the training set produced clearer gains than collecting more recordings from the same individuals, with diminishing returns beyond roughly 2,300 minutes per subject. Second, the model ran on device in approximately 8 to 9 milliseconds per 30-second epoch, with a memory footprint of about 42.5 MB — well within the constraints of consumer hardware.

This proof of concept has since been integrated into IDUN's product. The work demonstrated something we suspected but needed to verify: that a general brain foundation model, adapted with modest labeled data from our specific hardware, can achieve meaningful real-world performance. The intelligence didn't come from a massive, task-specific dataset. It came from a model that already understood neural signals — and needed only to be pointed in the right direction.

What's Next: From Sleep to Safety

The sleep staging collaboration answered a foundational question. Now we're asking the next one.

Building on the success of that first project, IDUN and Sigma Nova are again working together in 2026 — this time on a new application domain with significant real-world implications. The goal is to adapt the same foundational approach to detect changes in cognitive state that matter in high-stakes, real-time settings.

We're not ready to share full details yet, but the direction is clear: brain foundation models capable of running on our ear-worn hardware, delivering low-latency insights that could make a difference in safety-critical environments where cognitive impairment — whether from fatigue, distraction, or diminished alertness — carries serious consequences.

The underlying logic is the same as with sleep staging. We don't need to train a bespoke model from scratch for each new application. The same pretrained backbone that learned the temporal dynamics of neural activity during sleep carries knowledge that transfers to other cognitive state detection tasks. A new head, a new fine-tuning dataset, a new task — but the same foundation.

One Device, Many Applications

There's a broader principle at work here, and it's one we think about a lot at IDUN.

Hardware is a necessary foundation, but it's not where long-term value compounds. What compounds is data and intelligence. Every user who wears the IDUN Guardian overnight contributes to a richer understanding of what brain signals look like across diverse populations — different sleep architectures, different ear anatomies, different environments. That diversity, when combined with the right AI, becomes a moat.

Brain foundation models are the mechanism that turns this data advantage into application advantage. The same temporal attention backbone that stages sleep can, with appropriate adaptation, monitor fatigue, assess cognitive workload, or detect early markers of stress. Each new task reuses the learned foundation. Each new cohort of users expands the shared representation's robustness.

This is what we mean when we describe IDUN as "the smartwatch for the brain." A smartwatch is not just a step counter or a heart rate monitor — it's a platform that becomes more useful as more applications are built on top of it. The IDUN Guardian, powered by brain foundation models, is designed to work the same way.

A New Kind of Fluency

When linguists talk about fluency, they mean something more than knowing vocabulary and grammar. They mean the ability to navigate the full expressive range of a language — to pick up on nuance, respond to context, adapt to register. A truly fluent speaker doesn't translate word-by-word from their native tongue; they think in the language.

Brain foundation models are becoming fluent in neural signals. They don't process each EEG recording as a collection of isolated features. They understand temporal context, learn what precedes and follows different brain states, and build representations that generalize across people and devices. They are beginning — in a limited but meaningful sense — to think in the language of the brain.

At IDUN, our job is to build the hardware that captures this signal reliably in the real world, the data infrastructure to scale it responsibly, and the AI layer to turn it into something useful. The collaboration with Sigma Nova is one concrete step in that direction.

We're just getting started.

View full post