« HOME

Mapping Language

When we began building In Other Words, we needed something unprecedented: a comprehensive map of how words connect in the human mind—not just synonyms, but the full web of associations that make language rich with meaning. That vision led to the creation of the Linguabase, a linguistic database unlike any other.

1.1M Headwords
60M Relationships
<7 Degrees of Separation

More than synonyms

A close-up of an open page from a traditional thesaurus

Paper thesauri show only a fraction of English words, omitting concrete objects to save space. They cannot represent the countless connections that exist in our minds.

Traditional thesauri primarily offer synonyms for abstract concepts, emotions, and qualities. They typically exclude concrete objects and encyclopedic terms for practical reasons—physical space limitations and assumed user needs. When was the last time you needed a synonym for “apple”?

The Linguabase breaks this mold by including concrete objects, proper nouns, and encyclopedic terms alongside traditional thesaurus entries. But more importantly, it maps relationships beyond synonyms and antonyms:

This approach yields an average of 60 semantically connected words for each headword, covering multiple senses and contextual usages. These rich connections provide the foundation for In Other Words, enabling players to navigate through language's hidden pathways.

One word, many meanings

English is filled with words that carry multiple meanings, and capturing these was crucial for creating a complete semantic map. We identified three primary types:

Double Meanings

Words with entirely different definitions: “bass” (sound/fish), “tear” (eye/rip)

Related Meanings

Connected definitions: “head” as body part, leadership role, or ship’s bow

Contextual Flavors

“Hiking” as nature experience vs. physical exercise

These multi-sense words create semantic bridges between seemingly unrelated concepts. Words like “ground” can connect earth, coffee, and electrical circuits in a single conceptual leap.

Four pillars of knowledge

A visualization of the four knowledge sources combining into the Linguabase

To map language’s vast terrain, we needed diverse inputs—human expertise, statistical patterns, categorization systems, and AI augmentation—just as understanding any complex system requires multiple perspectives.

The Linguabase came to life by combining four distinct knowledge sources into an integrated scoring system:

1. Topics from books

We created over 10,000 word groups based on Library of Congress categories, which organize millions of books into hierarchical subject areas. This approach provided domain-specific vocabulary across all fields of knowledge, from ichthyology and vulcanology to epistemology and numismatics.

2. Multiple human-curated inputs

We drew from over 70 lexicographic resources, from Wiktionary and WordNet to specialized thesauri like the NASA Thesaurus and the National Library of Medicine's UMLS Metathesaurus. Each relationship's weight increased when it appeared across multiple sources.

General Sources

  • Wiktionary
  • WordNet
  • Roget's Thesaurus
  • Moby Thesaurus II

Specialized Sources

  • Getty Art & Architecture
  • NASA Thesaurus
  • UMLS Metathesaurus
  • AGROVOC Thesaurus

3. Topic modeling

To capture broader associations, we applied Latent Dirichlet Allocation to extensive English language collections, identifying approximately 8 abstract topics per term. This computation-intensive analysis required over 200,000 supercomputer hours on the NSF-funded Extreme Science and Engineering Discovery Environment (XSEDE)—work that would have taken decades on a single machine but was completed in less than a day through massive parallel processing.

4. Large language model enhancement

The breakthrough came in 2023 with GPT-4, which we used to expand our database—not for real-time generation, but for comprehensively mapping relationships. This technology excelled at:

The result was a complete semantic network with nearly a hundred million cross-links between individual words.

Connecting the dots

As we developed the Linguabase, we discovered something fascinating: virtually any two words in English can reach each other through a chain of meaningful connections, typically in seven steps or fewer. This linguistic “small world” phenomenon mirrors the “Six Degrees of Kevin Bacon” concept in social networks.

The Linguabase supports our In Other Words game by providing the underlying semantic connections that make word-path puzzles possible. Its weighted relationships and multi-sense representations create a rich landscape for players to explore.

sugar → sweet → harmony → peace

For details on how this linguistic database became a daily puzzle game, visit our Behind the Words page.

Your personal word explorer

Beyond the daily puzzle, the Linguabase also powers a comprehensive reference system within the app. This extends traditional thesaurus functionality by:

This reference mode serves as a visual thesaurus for writing, brainstorming, and discovering new expressions—all available offline and without the privacy concerns of sending queries to a language model.

The heart of our game

The Linguabase represents a unique fusion of traditional lexicography with modern AI-enhancement. By combining multiple knowledge sources and augmenting them with advanced language models, we've created a resource that's:

In Other Words transforms this linguistic data into tangible paths of discovery. What began as a vision to map the full spectrum of word relationships has become a new kind of word puzzle that celebrates language’s inherent interconnectedness.

READ MORE: See a more detailed backgrounder about the Linguabase

Curious how we transformed this linguistic database into a daily word game? Read Behind the Words to discover how we found the perfect game mechanic, balanced the difficulty, and engineered the experience.

Further
Explorations