Chinese time! 🕵️♀️
I had no idea what I should use to analyze Chinese characters as it’s so different from English, so I asked ChatGPT. They gave me three choices:
After looking these up, only HanziJS makes sense to me even though it’s already a bit outdated, last updated 5 years ago. But the documentation is very clear so I decided to go with it. It surprised me that it doesn’t only count word freq, but can also decompose Chinese characters and this is exactly what I want!
https://github.com/nieldlr/hanzi
Chinese characters has multiple structures, and a common one is left and right. If you observe them:
睁 打 吃 休 沙
They all consist of left and right parts. The left part is usually associated with meaning and is called a keyword (radical), e.g., associated with eyes, associated with movement, associated with mouth, associated with people. The right part usually represents pronunciation and is used as a phonetic symbol.
The radical I want to explore is “female” which means “women”. I read an article a while back about how it is used for many words with negative meanings, such as “jealousy”(嫉妒), “whoring”(嫖娼), and “demon/monster”(妖), “rape/adultery”(奸). It also applies to normal and good words such as sister (姐妹), marriage (婚姻), good (好), and wonderful (妙).
Ideally, I'd like to extract all the words with the “women” radical in a passage and analyze their meaning and relationship to the context, but for now I'm just trying to complete the first step: finding all the words with the “female” radical and checking the frequency of their occurrence.
After installing the lib, I used this function so that I could apply multiple characters.
var decomposition = hanzi.decomposeMany('爱橄黃');
console.log(decomposition);
{ '爱':
{ character: '爱',
components1: [ 'No glyph available', '友' ],
components2: [ '爫', '冖', '𠂇', '又' ],
components3: [ '爫', '冖', '𠂇', '㇇', '㇏' ] },
'橄':
{ character: '橄',
components1: [ '木', '敢' ],
components2: [ '木', 'No glyph available', '耳', '⺙' ],
components3: [ '一', '丨', '八', '匚', '二', '丨', '二', '丿', '一', '乂' ] },
'黃':
{ character: '黃',
components1: [ '廿', 'No glyph available' ],
components2: [ '黃' ],
components3: [ '卄', '一', '一', '二', '丨', '凵', '八' ] } }
But as I only need component1, I rewrote the code, and here’s what I got:
input
女人追求关系,男人追求占有。—小仓千加子一语道破。女人的嫉妒指向夺去男人的其他女人,而男人的嫉妒则指向了背叛自己的女人。因为女人的背叛是对男人所有权的侵犯,建立在占有一个女人的基础上而得以维系的男人的自我,会因此面临崩溃的危机。对于女人,嫉妒是以其他女人为对手围绕男人展开的竞争;而对于男人,嫉妒则是维护自尊和自我确认的争斗。(Women seek relationships, men seek possession. -Chikako Ogura puts it in a nutshell. A woman's jealousy is directed toward other women who take away from a man, while a man's jealousy is directed toward a woman who betrays her. Because a woman's betrayal is a violation of a man's ownership, the man's ego, which is based on the possession of a woman, is in danger of collapsing. For the woman, jealousy is a competition around the man with other women as rivals; for the man, jealousy is a struggle for self-esteem and self-affirmation.)
— Disgust against Women (The Feeling of Disgust against Females in Japan) Chizuko Ueno
output