Translation - Disaster Clippy Docs

Core rule

The system treats English as the core language for the primary embedding and indexing layer. That keeps retrieval behavior more uniform across mixed-source collections and avoids fragmenting the main search space into disconnected language silos.

What gets preserved

Using English as the normalized index language does not mean other languages are discarded. The system aims to keep multiple text layers whenever possible:

original-language text
translated English text
lineage back to the original source or timed segments where relevant

Why normalize to English

English is the current bridge language for the main search stack. A shared normalized embedding layer makes the retrieval side simpler and more consistent while still allowing multilingual inputs and outputs.

How language packs fit in

Language packs are how the system moves between the original language and the normalized English layer. The general pipeline is meant to be:

Acquire text in the original language
Preserve that original text as a durable layer
Translate into English when needed
Chunk and index from the English-normalized text
Continue exposing the original language alongside it where useful

What this means for users

A user should be able to work with English and non-English material in the same system. The main retrieval layer may be English-centered, but the content itself should remain multilingual and inspectable.

What this means for video

The same principle applies to transcripts. A video can have an original-language transcript, an English translated transcript, and English index chunks derived from that translated layer while still preserving the original source text.

Last updated: March 2026