Why your Japanese furigana app does not look like a Japanese book

Morphological vs orthographic / kanji-unit driven: two routes for Japanese furigana

Many Japanese learners first meet furigana and assume it simply means putting kana readings above kanji. That is not wrong, but it quickly creates confusion.

For the same sentence, 昨日、学校へ行った。, one automatic tool may produce 昨日きのう学校がっこう行ったいった. A children's book, textbook, or carefully typeset publication may prefer 昨日きのうがっこうった。.

Both help you read Japanese, but they answer different questions. The first asks: what does this sentence sound like? The second asks: how should these printed kanji be recognized by the reader?

Think of ruby as two separate decisions

In Japanese typography, furigana is usually discussed as ruby. In horizontal writing it is normally placed above the base text, and in vertical writing it is normally placed on the right; W3C also notes that ruby is often about half the size of the base text by default. [1]

But choosing what receives ruby and deciding how that ruby is attached are different decisions. Should every kanji receive help, or only difficult kanji? Should the reading attach to one kanji, each kanji in a compound, or a whole word?

JLREQ describes three important relationships: mono-ruby, jukugo-ruby, and group-ruby. Mono-ruby maps one base character to one ruby string. Jukugo-ruby keeps the compound as a unit while still preserving internal kanji-reading relationships. Group-ruby maps several base characters to one ruby string. [2]

Route one: morphological

The morphological route starts from language analysis. It segments a sentence into words or morphemes, then assigns readings, parts of speech, lemmas, and inflectional information.

Tools such as MeCab segment Japanese and output fields such as surface form, part of speech, base form, reading, and pronunciation; UniDic is also explicitly designed for morphological analysis and uses short units with layers such as lexeme, word form, written form, and pronunciation form. [3] [4]

For 学校へ行った, this route naturally gives 学校がっこう行ったいった. It cares that the word 学校 is read がっこう, and that the inflected form 行った is read いった.

This is excellent for automation: learning apps, browser extensions, text-to-speech, search indexes, dictionaries, and NLP systems all benefit from context-aware word readings. It can distinguish 生物せいぶつ in a biology context from 生物なまもの when talking about raw food.

The blind spot: word reading is not always printed ruby

The word form 行った is read いった, but publication-style ruby often writes った. The った is already visible in the base text, so it does not need to be placed above the kanji.

Likewise, 学校がっこう is enough for pronunciation, but a kanji-learning context may prefer がっこう. Here がっ is not the isolated reading がく; it is the reading segment that belongs to inside the compound after sound change.

For words such as 今日きょう, 大人おとな, and 小豆あずき, honest character-by-character splitting is not useful. A good orthographic system admits that these are group readings.

Route two: orthographic / kanji-unit driven

The orthographic route starts from the written page. Where are the kanji? Which compounds are they part of? Which characters need help for the intended reader? If ruby is added, should it be mono-ruby, jukugo-ruby, or group-ruby?

JLREQ also distinguishes full ruby, where all kanji receive ruby, from partial ruby, where only selected kanji receive it, sometimes only on first occurrence. [2] This is why printed Japanese can look so deliberate: it supports pronunciation while also preserving the structure of the written word.

Jukugo-ruby and group-ruby

Jukugo-ruby is the easily overlooked middle form. がっこう is not just two isolated ruby annotations. The compound is read as one word, but each kanji still receives a meaningful segment.

Group-ruby takes over when the word reading cannot be naturally distributed: 今日きょう, 明日あした, 大人おとな, 一昨日おととい. The Japanese Joyo Kanji framework also treats many ateji and jukujikun readings as word-level readings in an appendix, and it is a guide for modern writing rather than a complete list of every use. [5]

Differences

DimensionMorphological routeOrthographic / kanji-unit route
Main questionHow is this sentence or word read?Which printed kanji should receive which reading?
Basic unitMorpheme, word, short unit, inflected formBase text, kanji, compound, whole word
Typical output学校がっこう, 行ったいったがっこう, った
StrengthAutomation, context, TTS, vocabulary learningPrinted clarity, kanji learning, textbooks, children's books
WeaknessMay not show which kanji carries which soundHarder to automate and often needs rules or review

Why textbooks and children's books prefer the written route

They are not only helping children pronounce text; they are also teaching kanji. Japanese school materials assign kanji by grade, and official materials discuss how readings are distributed across school stages according to development, burden, daily use, and semantic understanding. [6] [7]

So 学校がっこう may be enough for reading aloud, but がっこう better teaches the relationship between the written compound and its sound. 行ったいった is enough for pronunciation, but った better shows how kanji and okurigana work together.

Four examples

学校. Morphological: 学校がっこう. Orthographic: がっこう. The first is good for vocabulary; the second is good for kanji and publication-style reading.

行った. Morphological: 行ったいった. Orthographic: った. The okurigana remains visible in the base text.

今日. Both routes often give 今日きょう, because forcing きょ would teach the wrong intuition.

生物. Context matters. 生物を研究する is usually 生物せいぶつ; 生物を冷蔵庫に入れる is 生物なまもの. A written-route system may show せいぶつ for the first reading, but keep 生物なまもの as a group reading for the second.

Which is better?

If you only need to read a sentence aloud smoothly, the morphological route is direct and useful. If you are learning kanji, reading children's books, doing close reading, or making teaching material, the orthographic route is more faithful to the printed page.

The best teaching often uses both layers: teach vocabulary as 学校 = がっこう, teach reading as がっこう, and explain that 今日きょう is a whole-word reading.

For apps and learning tools: use a two-stage hybrid

A high-quality furigana system should first use morphological analysis to know how the word is read, then use publication-style rules to decide how that reading should attach back to the printed characters. The difficult second step is no longer just NLP; it mixes linguistics, kanji education, Japanese typography, and reader design.

Two pairs of glasses

The morphological route is an auditory pair of glasses. It sees Japanese as a stream of sound and asks: how should this sentence be pronounced?

The orthographic route is a written pair of glasses. It sees kanji, kana, compounds, lines, and layout, and asks: how can this page help the reader read and learn the writing?

OCAT note. In OCAT, open Settings - Experimental Options to freely choose orthographic / kanji-unit driven furigana or morphological furigana. The default is the orthographic mode. OCAT may be one of the only Japanese learning apps that supports this publication-style route.

References

  1. Rules for Simple Placement of Japanese Ruby
  2. Requirements for Japanese Text Layout 日本語組版処理の要件
  3. MeCab: Yet Another Part-of-Speech and Morphological Analyzer
  4. UniDic glossary
  5. 文化庁 常用漢字表
  6. 学年別漢字配当表
  7. 文部科学省 音訓の小・中・高等学校段階別割り振り表

OCAT

Learning full sentences is more effective than memorizing single words!

OCAT is an app that helps you quickly improve your spoken Japanese, Chinese, Cantonese, French, German, Spanish, Korean, and Russian through conversations with AI.

You can easily collect authentic sentences with pronunciation guides, audio playback, and bookmarks to quickly build your own everyday sentence bank. Play them on repeat, train your ears, and naturally speak with confidence.

Learning full sentences is more effective than memorizing single words!

The name "OCAT" comes from the Japanese phrase "おしえてoshiete", meaning "please teach me."

Key Features