

Chinese character classification

Excerpt from a 1436 primer on Chinese characters
The earliest known Chinese texts, in the Oracle bone script, display a fully developed writing system, with little difference in functionality from modern characters. It is assumed that the early stages of the development of characters were dominated by pictograms, which were the objects depicted, and ideograms, in which meaning was expressed iconically. The demands of writing full language, including words which had no easy pictographic or iconic representation, forced an expansion of this system, presumably through use of rebus.
The presumed methods of forming characters were first classified c. 100 AD by the Chinese linguist Xu Shen (許慎), whose etymological dictionary Shuowen Jiezi (说文解字 / 說文解字) divides the script into six categories, the liùshū (六书 / 六書). While the categories and classification are occasionally problematic and arguably fail to reflect the complete nature of the Chinese writing system, this account has been perpetuated by its long history and pervasive use.
Four percent of Chinese characters are derived directly from individual pictograms, though in most cases the resemblance to an object is no longer clear. Others were derived as ideograms; as compound ideograms, where two ideograms are combined to give a third reading; and as rebuses. But most characters were devised as phono-semantic compounds, with one element to indicate the general category of meaning and the other to suggest the pronunciation. Again, in many cases the suggested sound is no longer accurate. All today are logograms, and are not actually used pictographically or ideographically.
Contrary to popular belief, pictograms make up only a small portion of Chinese characters. While characters in this class derive from pictures, they have been standardized, simplified, and stylized to make them easier to write, and their derivation is therefore not always obvious. Examples include 日 rì for "sun", 月 yuèfor "moon", and 木 mù for "tree"....
There is no concrete number for the proportion of modern characters that are pictographic in nature; however, Xu Shen (c. 100 AD) estimated that 4% of characters fell into this category.
Also called simple indicatives or simple ideographs, these characters either modify existing pictographs iconically, or are direct iconic illustrations. For instance, by modifying 刀 dāo, a pictogram for "knife", by marking the blade, an ideogram 刃 rèn for "blade" is obtained. Direct examples include 上 shàng "up" and 下 xià "down". This category is small.
Translated literally as logical aggregates or associative compounds, these characters symbolically combine pictograms or ideograms to create a third character. For instance, doubling the pictogram 木 mù "tree" produces 林 lín "grove", while tripling it produces 森 sēn "forest". (It is interesting to note (see below) that 林 and 森 both have the same reconstructed Old Chinese final *-ǐǝm.) Similarly, combining 日 rì "sun" and 月 yuè "moon", the two natural sources of light, makes 明 míng "bright". Other commonly cited examples include the characters 休 xiū "rest", composed of the pictograms 人 rén "person" and 木 mù "tree", and also 好 hǎo "good", composed of the pictograms 女 nǚ "woman" and 子 zǐ "son/child".
Xu Shen estimated that 13% of characters fall into this category.
Some scholars flatly reject the existence of this category, opining that failure of modern attempts to identify a phonetic in a compound is due simply to our not looking at ancient "secondary readings", which were lost over time. For example, the character 安 ān "peace", a combination of "roof" 宀 and "woman" 女, is commonly cited as an ideogrammic compound, purportedly motivated by a meaning such as "all is peaceful with the woman at home". However, there is evidence that 女 was once a polyphone with a secondary reading of*an, as may be gleaned from the set 妟 yàn "tranquil", 奻 nuán "to quarrel", and 姦 jiān "licentious". Supporting this reasoning is the fact that modern interpretations often neglect archaic forms that were in use when the characters were created.
These arguments notwithstanding, there are some characters that do appear to genuinely belong to this category. It is doubtful that secondary readings can be found for many cases, and the characters 林, 森, 明, 休, and 好 are all attested in oracle bone script, with the same components as the modern forms.
By far the most numerous category are the phono-semantic compounds, also called semantic-phonetic compounds or pictophonetic compounds. These characters are composed of two parts: one of a limited set of pictographs, often graphically simplified, which suggests the general meaning of the character, and an existing character pronounced approximately as the new target word.
Examples are 河 hé "river", 湖 hú "lake", 流 liú "stream", 沖 chōng "riptide" (or "flush"), 滑 huá "slippery". All these characters have on the left a radical of three short strokes, which is a simplified pictograph for a river, indicating that the character has a semantic connection with water; the right-hand side in each case is a phonetic indicator. For example, in the case of沖 chōng (Old Chinese /druŋ/), the phonetic indicator is 中 zhōng (Old Chinese /truŋ/), which by itself means "middle". In this case it can be seen that the pronunciation of the character is slightly different from that of its phonetic indicator; this process means that the composition of such characters can sometimes seem arbitrary today. Further, the choice of radicals may also seem arbitrary in some cases; for example, the radical of 貓 māo "cat" is 豸 zhì, originally a pictograph for worms, but in characters of this sort indicating an animal of any sort.
Xu Shen (c. 100 CE) placed approximately 82% of characters into this category, while in the Kangxi Dictionary (1716 CE) the number is closer to 90%, due to the extremely productive use of this technique to extend the Chinese vocabulary.
This method is still sometimes used to form new characters, for example 钚 bù "plutonium") is the metal radical 金 jīn plus the phonetic component 不 bù, described in Chinese as "不gives sound, 金 gives meaning". Many Chinese names of elements in the periodic table and many other chemistry-related characters were formed this way.
Characters in this category originally didn't represent the same meaning but have bifurcated through orthographic and often semantic drift. For instance, 考 kǎo "to verify" and 老 lǎo "old" were once the same character, meaning "elderly person", but detached into two separate words. Characters of this category are rare, so in modern systems this group is often omitted or combined with others.
Also called borrowings or phonetic loan characters, this category covers cases where an existing character is used to represent an unrelated word with similar pronunciation; sometimes the old meaning is then lost completely, as with characters such as 自 zì, which has lost its original meaning of "nose" completely and exclusively means "oneself", or 萬 wàn, which originally meant "scorpion" but is now used only in the sense of "ten thousand".