1. Hyphenation, a perspective on the RenderX approach


Hyphenation, a perspective on the RenderX approach

Nikolai Grigoriev

A good hyphenation file should mark permitted break points, and disable breaks that are undesirable. Do you mean that your patterns produce spurious hyphenations that don't happen in TeX only because line-breaking algorithm always finds a better alternative?

Liang's algorithm uses priorites in hyphenation patterns: these priorities are certainly respected. XEP's hyphenator finds all breaks permitted by Liang's algorithm plus additional constraints from hyphenation-{push|remain}-character-count, and only those breaks.

What differs XEP from TeX is its line-breaking algorithm that triggers hyphenation. We don't use global optimization - our approach considers only single lines. It is in this point that we drop all priorities - all hyphenation point permitted by Liang's patterns are considered equivalent. But my impression is that FOP does the same (I may be wrong).

Line breaking is not Liang's part, and the algorithm of pattern processing in XEP is exactly Liang's, with no omissions.

Anyhow, XEP does not hyphenate until it actually has to: if it can get through by slightly adjusting inter-character or inter-word spaces, it does. (You can notice that XEP almost never hyphenates long lines; and if it has to, it tends to split words in the middle). So, for an ordinary text, the penalty for treating all hyphenation points as being equivalent is actually negligible. (I'd like to stress once again that we don't produce hyphenations that are not permitted by patterns).

> I can't understand how such a simplified algorithm could perform well,
> unless it discards most of the valid hyphenation points. Have you tested
> XEP's hyphenation algorithm with a bunch of hard-to-hyphenate words?
> (For example, several words with tricky combinations of consonants
> and vowels around? ).

Certainly, there are some words that don't hyphenate well (e.g. "names-pace" for English :-)). You have to put them into \hyphenation {}.