Why your Japanese furigana app does not look like a Japanese book
Many Japanese learners first meet furigana and assume it simply means putting kana readings above kanji. That is not wrong, but it quickly creates confusion.
For the same sentence, 昨日、学校へ行った。, one automatic tool may produce 昨日、学校へ行った。. A children's book, textbook, or carefully typeset publication may prefer 昨日、学校へ行った。.
Both help you read Japanese, but they answer different questions. The first asks: what does this sentence sound like? The second asks: how should these printed kanji be recognized by the reader?
Think of ruby as two separate decisions
In Japanese typography, furigana is usually discussed as ruby. In horizontal writing it is normally placed above the base text, and in vertical writing it is normally placed on the right; W3C also notes that ruby is often about half the size of the base text by default. [1]
But choosing what receives ruby and deciding how that ruby is attached are different decisions. Should every kanji receive help, or only difficult kanji? Should the reading attach to one kanji, each kanji in a compound, or a whole word?
JLREQ describes three important relationships: mono-ruby, jukugo-ruby, and group-ruby. Mono-ruby maps one base character to one ruby string. Jukugo-ruby keeps the compound as a unit while still preserving internal kanji-reading relationships. Group-ruby maps several base characters to one ruby string. [2]
Route one: morphological
The morphological route starts from language analysis. It segments a sentence into words or morphemes, then assigns readings, parts of speech, lemmas, and inflectional information.
Tools such as MeCab segment Japanese and output fields such as surface form, part of speech, base form, reading, and pronunciation; UniDic is also explicitly designed for morphological analysis and uses short units with layers such as lexeme, word form, written form, and pronunciation form. [3] [4]
For 学校へ行った, this route naturally gives 学校へ行った. It cares that the word 学校 is read がっこう, and that the inflected form 行った is read いった.
This is excellent for automation: learning apps, browser extensions, text-to-speech, search indexes, dictionaries, and NLP systems all benefit from context-aware word readings. It can distinguish 生物 in a biology context from 生物 when talking about raw food.
The blind spot: word reading is not always printed ruby
The word form 行った is read いった, but publication-style ruby often writes 行った. The った is already visible in the base text, so it does not need to be placed above the kanji.
Likewise, 学校 is enough for pronunciation, but a kanji-learning context may prefer 学校. Here がっ is not the isolated reading がく; it is the reading segment that belongs to 学 inside the compound after sound change.
For words such as 今日, 大人, and 小豆, honest character-by-character splitting is not useful. A good orthographic system admits that these are group readings.
Route two: orthographic / kanji-unit driven
The orthographic route starts from the written page. Where are the kanji? Which compounds are they part of? Which characters need help for the intended reader? If ruby is added, should it be mono-ruby, jukugo-ruby, or group-ruby?
JLREQ also distinguishes full ruby, where all kanji receive ruby, from partial ruby, where only selected kanji receive it, sometimes only on first occurrence. [2] This is why printed Japanese can look so deliberate: it supports pronunciation while also preserving the structure of the written word.
Jukugo-ruby and group-ruby
Jukugo-ruby is the easily overlooked middle form. 学校 is not just two isolated ruby annotations. The compound is read as one word, but each kanji still receives a meaningful segment.
Group-ruby takes over when the word reading cannot be naturally distributed: 今日, 明日, 大人, 一昨日. The Japanese Joyo Kanji framework also treats many ateji and jukujikun readings as word-level readings in an appendix, and it is a guide for modern writing rather than a complete list of every use. [5]
Differences
| Dimension | Morphological route | Orthographic / kanji-unit route |
|---|---|---|
| Main question | How is this sentence or word read? | Which printed kanji should receive which reading? |
| Basic unit | Morpheme, word, short unit, inflected form | Base text, kanji, compound, whole word |
| Typical output | 学校, 行った | 学校, 行った |
| Strength | Automation, context, TTS, vocabulary learning | Printed clarity, kanji learning, textbooks, children's books |
| Weakness | May not show which kanji carries which sound | Harder to automate and often needs rules or review |
Why textbooks and children's books prefer the written route
They are not only helping children pronounce text; they are also teaching kanji. Japanese school materials assign kanji by grade, and official materials discuss how readings are distributed across school stages according to development, burden, daily use, and semantic understanding. [6] [7]
So 学校 may be enough for reading aloud, but 学校 better teaches the relationship between the written compound and its sound. 行った is enough for pronunciation, but 行った better shows how kanji and okurigana work together.
Four examples
学校. Morphological: 学校. Orthographic: 学校. The first is good for vocabulary; the second is good for kanji and publication-style reading.
行った. Morphological: 行った. Orthographic: 行った. The okurigana remains visible in the base text.
今日. Both routes often give 今日, because forcing 今日 would teach the wrong intuition.
生物. Context matters. 生物を研究する is usually 生物; 生物を冷蔵庫に入れる is 生物. A written-route system may show 生物 for the first reading, but keep 生物 as a group reading for the second.
Which is better?
If you only need to read a sentence aloud smoothly, the morphological route is direct and useful. If you are learning kanji, reading children's books, doing close reading, or making teaching material, the orthographic route is more faithful to the printed page.
The best teaching often uses both layers: teach vocabulary as 学校 = がっこう, teach reading as 学校, and explain that 今日 is a whole-word reading.
For apps and learning tools: use a two-stage hybrid
A high-quality furigana system should first use morphological analysis to know how the word is read, then use publication-style rules to decide how that reading should attach back to the printed characters. The difficult second step is no longer just NLP; it mixes linguistics, kanji education, Japanese typography, and reader design.
Two pairs of glasses
The morphological route is an auditory pair of glasses. It sees Japanese as a stream of sound and asks: how should this sentence be pronounced?
The orthographic route is a written pair of glasses. It sees kanji, kana, compounds, lines, and layout, and asks: how can this page help the reader read and learn the writing?
OCAT note. In OCAT, open Settings - Experimental Options to freely choose orthographic / kanji-unit driven furigana or morphological furigana. The default is the orthographic mode. OCAT may be one of the only Japanese learning apps that supports this publication-style route.
