 |
Conditional entropy as a measure of linguistic remoteness between related languages
Jens Moberg, Charlotte Gooskens, John Nerbonne
University of Groningen
The Scandinavian languages are so alike that their speakers often communicate, each using their own language, which Haugen dubbed 'semi-communication'.
The success of semi-communication depends on the languages involved, and,
moreover, is asymmetric: Swedish is more easily understandable for a Dane, than Danish for a Swede. We model the success of semi-communication through the conditional entropy of the phoneme mapping in corresponding words.
Semantically corresponding words were taken from frequency lists, and aligned, and the conditional entropy of the phoneme mapping in aligned word pairs was calculated. This gives us information about the difficulty of predicting a phoneme in the native language given a corresponding phoneme in the foreign language. We also examine the conditional entropy of selected word classes, such as function/content and native/loan words.
|