Takafumi Suzuki earned a BA in Literature, and an MA and a Ph.D in Interdisciplinary Information Studies from University of Tokyo. He is currently an associate professor at Department of Media and Communications, Faculty of Sociology, Toyo University.
Mai Hosoya earned a BA in Sociology from Toyo University. She is currently working in the public sector.
This is the source
This study analyzes popular songs composed by Japanese female
singer-songwriters. Popular songs are a good representation
of modern culture and society. Songs by female
singer-songwriters account for a large portion of the
current Japanese hit charts and particularly play an
important role in understanding the Japanese language and
communication style. In this study, we applied new methods
of computational stylistics to the lyrics of the songs. The
results clearly show differences in the characteristics of
10 female singer-songwriters, and we found that the
Popular songs by Japanese female singer-songwriters: a computational stylistics approach
When listening to music people often think
If we turn our eyes to the situation in modern Japan, there
are in fact not only communities on social networks relating
to
Singer-songwriters are singers that compose their own lyrics
and music and they form one of the major genres in the hit
charts in modern Japan
There have been some studies that tried to investigate the
characteristics of language and communication styles from
popular songs, although their methods mainly involved
qualitative analyses of source documents or interviews with
artists and eminent personalities in these fields. Some
studies stressed the importance of investigating the lyrical
characteristics of popular songs, but in fact they focused
only on specific songs or singers for their analyses
Considering this backdrop, this study uses computational stylistics methods to analyze popular Japanese songs composed by Japanese female singer-songwriters over the past 30 years. From the viewpoint of computational stylistics, the purpose of this study has the following four characteristics: (a) the number of tokens is considerably small, (b) several factors (e.g., authors and eras) affect the textual characteristics, (c) the content as well as the style can affect the textual characteristics, and (d) we wanted to obtain linguistically and sociologically meaningful findings instead of just enhancing the classification method. We deliberately selected the features and methods appropriate for our purposes to obtain an important case study for computational stylistics. We also attempted to provide useful knowledge for understanding current Japanese language and communication styles.
We selected 116 songs composed by 10 singer-songwriters
(those who wrote more than 90% of their songs) that
appeared on the Oricon annual top 100 single hit chart
Previous authorship attribution studies showed that
function words or character based n-grams provide a more
robust classification performance
When we considered that several factors affect the textual characteristics, we first applied the kernel PCA to the matrix in order to examine the factors affecting the lyrical characteristics of the songs. We selected the Gaussian kernel and parameter sigma = 0.1, which gave the best possible results for interpretation after attempting several kernels and parameters. Previous studies used a principal component analysis, a correspondent analysis, or a factor analysis for these tasks; however, the kernel PCA sets up kernels and parameters more flexibly; thus, it is more suitable for use in preliminary and exploratory research, such as this paper.
Second, we applied random forests machine learning
methods
We analyzed the results of the classification to find the singers who had special individual lyrical characteristics (high classification performance) and those who had common lyrical characteristics (low classification performance). In addition, we extracted the important features for classification in order to find the special distinguishing lyrical characteristic of every singer-songwriter.
Random forests are an improved method of bagging
We first sampled from i cases at random from the
original text-feature matrix M_[i,j] with replacements
to make a bootstrap sample, and we extracted random
subsets of [root j] variables from a bootstrap sample to
make a sample for constructing an unpruned decision
tree. To split the nodes, we used the Gini index
formalized as follows:
An important characteristic of random forests is that it
returns variable importance (acu
acu
acu
Table 1 shows the year the selected female singer-songwriters made their debuts and their ages at those times. The debut years and debut ages were taken from the official websites of the respective singer-songwriters. First, we found that the debut age ranged from the late teens to early 20s. The average debut age is 20.7. In general, because of the high profile of Hikaru Utada, the impression is that the debut age has fallen in recent years. However, these results indicate that the debut age has thus far barely changed.
Next, if we look at the data by decade, there were three
singer-songwriters in the 1970s who regularly had hit
songs, in the 1990s there were five, and in the 2000s
there were two. In contrast, there were no
singer-songwriters in the 1980s that matched our
selection criteria. In the 1980s, instead of
singer-songwriters, female singers referred to as idols
frequented the popular chart. Around that time, popular
music programs were established on TV and were beginning
to take off and, rather than singer-songwriters who pour
forth their beliefs and feelings in their lyrics, it was
the attractive, generally well-received idols who
attracted sponsors that were popular
Table 2 represents the number of texts, and the mean and standard deviations with coefficients of variances of the number of tokens for the 10 singer-songwriters. The number of texts ranges from 7 to 20, and the mean number of tokens ranges from 155 to 273. The table shows that Utada and Nakajima have a large number of songs, whereas Hirose, Takeuchi and Shina have a small number of songs in our dataset. It also shows that Hirose generally prefers short songs, and YUI and Utada have large variances in the length of their songs. As explained later, this may be because they preferred English words.
Figure 1 and 2 show three dimensional scatter plots of the first three principal components by kernel PCA. The results show that the songs by the same singer are grouped, for example, aiko's songs are grouped in the upper side, Utada and YUI's songs toward the right, and Nakajima's songs toward the left. In the kernel PCA, compared with the factors concerning the singer, the factors concerning the era are not clearly observed, which was also true when the release years were used as labels.
The grouping results and a qualitative analysis of the songs shows that the first principal component includes the factor of loanwords because Utada and YUI's songs have many of such words, and the third principal component includes the factor of pronouns, because aiko's songs have many special first person pronouns. However, these three axes are composed of many words, thus their labeling becomes difficult.
Table 3 represents the confusion matrix constructed by random forests. The results show that the songs by aiko, Nakajima, and Utada have high classification performance, whereas those by Hirose, Oguro, and Matsutoya have low classification performance. These results also show that the songs by the first three singer-songwriters have special individual characteristics among Japanese female singer-songwriters, while those by the last three singer-songwriters have common characteristics. It is certain that the number of texts and the number of tokens can affect classification performance. However, aiko, for example, has no special lyrical characteristics within these textual characteristics; thus, these results also show the special lyrical characteristics of the singers.
Table 4 represents the results of the top 20 important features obtained from the classification experiment calculated by random forests. The results show that the top 20 important features include many pronouns, final particles, and auxiliary verbs. The results also show that these words are particularly important for discriminating the songs by 10 Japanese female singer-songwriters.
Next, we conducted a classification experiment using the
lyrics of the three singer-songwriters who made their
debut in the 1970s and the lyrics of the seven
singer-songwriters who have made their debut since the
1990s. According to the kernel PCA, differences based on
singer are more significant than those based on the
simple era labels. Therefore, era classification was
conducted according to the debut year of the singer
rather than by decade. For example, in the 1990s there
are of course compositions written by Miyuki Nakajima,
but these are treated as Miyuki Nakajima's compositions.
Table 5 shows the
confusion matrix and Table
6 shows the top 20 important features in the
experiment. We found that a number of trends can be
identified in Table 5
because there are many songs from the new era, but a
moderate classification was obtained from the confusion
matrix results. Many function words, such as wa
and wo
and many personal pronouns, such as
watashi (I),
are included in Table 6 as well as in Table 4. As we shall see
later, the other words of high importance in the
singer-songwriter classification, those that contribute
significantly to the classification, also achieve high
values in the classification according to era.
Table 7 represents the top 20 important features for discriminating the 10 singer-songwriters. Next, we discuss the results in the last table together with the qualitative analysis results. In particular, the analysis for Kohmi Hirose, Yumi Matsutoya, and Maki Ohguro, for whom the classification experiment performance was poor, was supplemented by a mainly qualitative study of their lyrics.
The first person pronoun
was the most important
feature, although
is usually used in
spoken Japanese. Pronouns have strong discriminant power
in authorship attribution
,
, and
appeared at 7, 8
and 14 ranks, and we inferred that they show her special
lyrical characteristic. This is apparent in her use of
the following lyrics for example,
(
赤く染まる指先や頬を(akaku somaru yubisaki ya hoho wo ; your blushing fingertips and cheeks)
(,スタ ー[star])
(唇かんで指で触ってあなたとのキス 確かめてたら(kuchibiru kande yubi de sawatte anata tono kisu tashikame te tara ; bite my lip, touch it with my finger, when I go over kissing you in my mind)
(頬は熱くなってたまに悲しくもなった(hoho wa atsuku natte tamani kanashiku mo natta ; my cheeks burned hot and now and again I felt sad
(三角の耳した天使は恋のため息聞いて目を丸くしたあたしを指さし(sankaku no mimishita tenshi wa koi no tameiki kiite me wo maruku shita atashi wo yubisashi ; the angel with the triangular ears heard a sigh of love and pointed at me, my eyes wide
,tsume (nails)
,kami (hair)
,yubi (fingers)
mune (chest),
mimi (ears), and
te (hands), are apparent in such lyrics as,
(あたしの髪が揺れる距離の息づかいや きつく握り返してくれた手はさらに 消えなくなるのにね(atashi no kami ga yureru kyori no ikidsukai ya kitsuku nigirikaeshite kureta te wa sarani kienaku naru nonine ; your breath and the hand that gripped me tightly does not fade with a shake of my hair
(深爪したことも (fukadsume shita koto mo ; and a nail cut too close
Many content words are ranked in the top ranked words.
Contrary to function words, content words were known to
serve as noises for better classification between
authors [love
and
appear as one of the
important words. It is thought that the word love
ranked top because there are many phrases that contain
love
applied to sections where the lyrics are
repeated, such as Fall in Love
(
(私だけに White Love Song歌ってほしいの (watashidakeni White Love Songutatte hoshii no ; I want you to sing a white love song just for me
) andゲレンデがとけるほど恋したい [Garendega tokeru hodo koi shitai ; I want to love you in a way that will melt the ski slopes]
(ずっと Eternal Love (zutto Eternal Love; forever, eternal love)
winterin Japan, but not a single word relating to winter featured in the top ranking words, an unexpected result.
The particle
that represents the object is ranked in the top ranked
words. Examples of lyrics that use
include,
(
(夢をくれし君の眼差しが肩を抱く (yumewo kureshi kimi no manazashi ga kata wo daku ; the protective gaze of the one who dreams of me wraps around my shoulders)
appears in the top rankings. Picking out several lyrics as they appear reveals lyrical themes concerning love. For example,ai (love)
(愛をくれし君のなつかしき声がする (ai wo kureshi kimi no natsukashiki koe ga suru ; I hear the longed-for voice of the one who loves me)
) (私を愛したことを後悔はしていないかしら (watashi wo aishita koto wo koukai ha shite inai kashira ; I wonder if you regret loving me
Hiragana appeared at all except 10, 15 and 16 ranks, and
many of Hiragana words were function words, though other
singer-songwriters included more content words. We
inferred that she had a special lyrical characteristic
on the basis of the function words rather than the
content words. Function words have strong discriminant
power in authorship attribution
Compared to the other artists, there are few
characteristic words and also the classification
performance is poor. We conclude that either Maki Ohguro
is an artist with few characteristic words or she is an
artist with an extensive vocabulary who does not use
similar expressions. However, it should be noted that
the word
is included only in Oguro's important
words. Words that represent time, such as
,
and
, are not
commonly heard from the other artists. Specifically,
phrases such as,
(
(過去を責めてもあなたは帰らない (kako wo semetemo anata wa nera nai ; even if I blame the past, you do not return)
Mathematical numbers such as 1 and 2 appeared in the 1
and 4 ranks, while we can use kanji to indicate the same
meaning in Japanese. In addition, katakana in Japanese
(
) appeared at 7 rank and chatter
expressions such as
and
appeared at 14 and 15
ranks. We inferred that these expressions make her songs
moderate and accessible, and can represent her special
lyrical characteristics. Examples of her lyrics include
,
(
もう1 つ食べたいわ (mou hitotsu tabetai wa ; I want to eat one more),
,もう1 杯飲みたいわ (mou ippai nomi tai wa ; I want to drink one more)
(人はとてもめんどうだから (hittori wa totemo mendou dakara ; It is tedious being alone)
, and夏の終わりに2 人で抜けだしたこの公園で見つけた (natsu no owarini futari de nukedashita kono koen de mitsuketa ; we snuck out at the end of summer and met at this park)
(番に君が好きだよ強くいられる (ichiban ni kimi ga suki dayo tsuyoku irareru ; I love you more than anything and it makes me strong)
written inkoto (thing)
intoki (when)
(好きなトキ出かけて好きなトキ甘えて (sukina toki dekakete sukina toki amaete ; going out when I want to, spoiling myself when I want to)
Kanji appeared at 2, 5, 6, 7, 11, 12, 15, 17 ranks. Even
function words such as
and
were
used in
(
as representing the same meaning, but Shina often ventures to use the kanji. For example, Otsuka writes,居 (i ; presence)
(いつだってそこにいてあげるんだ (itsu datte soko ni ite agerunda ; I'll always be there for you)
(一番愛しいあなたの声迄掠れさせて居たのだろう (ichiban itoshii anata no koe made kasuresasete ita no darou ; turned even your voice, the thing I love most, hoarse)
(ずっと繋がれて居たいわ (zutto tsunagarete itai wa ; I want to stay connected to you forever)
(返して貰うまでもない筈 (kaeshite morau mademo naihazu ; you shouldn't even go so far as to get it back)
),ありあまる富 [Ariamaru tomi ; excessive wealth]
(もっと中迄入って (motto nakamade haite ; get deeper inside)
) and本能 honnou ; instinct
(此処に居て (kokoni ite ; stay here)
Content words appeared at 2, 4, 5, 6, 12, 13, 16, 19, 20 ranks; therefore, we inferred that this lyrical characteristic of her songs led to high error rates shown in Table 2 as well as Kohmi Hirose. Among the 10 singer-songwriters, Takeuchi and Nakajima belonged to the 1970 and 1990 eras; their lyrical characteristics were considerably different. In Takeuchi's songs,
(私だって命がけの恋に憧れることはある (watashi datte inochigake no koini akogareru kotowa aru ; I yearn for a desperate love)
and手放した恋を今あなたも悔やんでるなら (tebanashita koi wo ima anatamo kuyanderu nara ; if you also regret throwing our love away)
(電話ぐらいくれてもいいのに (denwa gurai kuretemo iinoni ; just a phone call would do)
(にぎわう街の音がかすかに聞こえる (nigiwau machi no oto ga kasukani kikoeru ; I hear the faint sound of the bustling city)
The second person pronoun
was the most important
distinguishing characteristic of Utada's songs, while
first person pronouns were more important in the other
nine singers. In the case of other artists, words that
represent the first person, such as
and
appear in the top rankings more often than
words that represent the second person, such as
and
, and it is only in
Utada's lyrics that second person pronouns out rank
first person pronouns. Specific examples of her lyrics
with
include,
(
(いいじゃないかキャンバスは君のもの (iija nai ka kyanbasu wa kimi no mono ; that'll do, right? The canvas is yours)
(君という光が私をみつける (kimi to iu hikari ga watashi wo mitsukeru ; a light calledyou has found me)
andもっと君に近づきたいよ (motto kimini chikazuki taiyo ; I want to get closer to you)
(そっと君に手を伸ばすよ (sotto kimi ni te wo nobasuyo ; reaching out for you)
(近づきたいよ君の理想に (chikazuki taiyo kimi no risouni ; I want to get closer to your ideal)
Iand
babyappeared at 2, 3, 6, 7, 16 ranks, and we inferred that they represent her special lyrical characteristics. It is inferred that she uses a lot of imported words because she was born in New York in America.
Chatter expressions such as
,
,
,
, as well as question marks,
appeared at 1, 2, 4, 7, 18 ranks. YUI moved to Tokyo
from Fukuoka at the age of 17 and is the youngest artist
in this analysis. It is inferred that her lyrics contain
Japanese expression of the new generation. Although YUI
is commonly known to prefer English words in her lyrics
appeared in YUI's important words, but
not those of any of the other artists. The appearance of
words that represent periods of time within a day, such
as
and
and
and
,
clearly demonstrates a world view in terms of time in
the lyrics. Specifically, in lyrics such as
(
(なみだいろ声が聞こえない夜は (Namida iro koe ga kikoe nai yoru wa ; the color of tears on nights when I can't hear your voice)
(わすれちゃいそうな夜の真ん中 (wasure chai souna yoru no mannaka ; in the middle of the night I begin to forget that)
In the section above, we analyzed the characteristics of individual singer-songwriters. The common point that emerged from this analysis is that love is invariably an important subject for women regardless of the era. Many of the listeners of songs written and sung by women such as Yumi Matsutoya and Mariya Takeuchi in earlier decades and aiko and Ai Otsuka in recent decades are young women, and love is a common theme and an important feature of communication for this age group.
However, the style used to talk about love differs by era
and singer. For example, singers like Mariya Takeuchi and
Yumi Matsutoya use many words related less to linguistic
expression and more to lyrical content, imagery, and
feelings, expressing their individuality through such
content and imagery. In contrast, modern singers assert a
stronger linguistic and stylistic individuality: aiko with
her concrete representations of the body, Utada with
imported words, YUI with colloquialisms, Otsuka with
numerals,
Naturally, with this change there is also a corresponding
change in the listeners. Since 2000, communication on the
internet and by cell phone has increased and people's
conditions for communication and the style of language
people use when communicating continue to change
significantly. The arrival of the internet is producing a
new Japanese language that is neither spoken nor written
language
According to the findings of our research, the most
significant change in the lyrics of current
singer-songwriters can be conceptualized as
in
in
With regard to emojis and emoticons, these general changes
in Japanese are written ones. However, written lyrics also
have an important role: they must be sung. Thus, we assume
that in the dynamic era of Japanese and its communication
styles, singer-songwriters must be sensitive in their
overall approach to writing lyrics, in which the meanings of
the words and their characters must effectively transmit
personal feelings or emotions. As mentioned in the
introduction, karaoke is a popular activity among young
people, in which the
visual effects of the lyrics
even in
English.
In summary, this study analyzed popular Japanese songs composed by Japanese female singer-songwriters over the past 30 years, by using methods of computational stylistics. Our texts contained various characteristics as mentioned in the Introduction; thus our study using kernel PCA and random forests provided an important case study for computational stylistics. We also provided empirical knowledge for understanding the Japanese language, and communication styles.
The results of this study showed that although love is a
common theme in the lyrics of female singer-songwriters, the
manner of expressing love varies significantly depending on
the era and the individual singer. The subtle topic of love
certainly differs for every singer-songwriter as well as
every song. However, we found many content-independent
characteristics, such as the selection of sung
, and The action of
. However, in contemporary Japanese
society, in which karaoke is popular and communication
styles have changed, the popular songs are
The shifts in modern Japanese culture can be investigated further by analyzing each respective genre, lyrics in conjunction with the other non-lyric musical elements of popular music. It is no doubt possible to enhance the results of this study by applying content analysis to the lyrics and by using a larger data set. We intend to work on these points in the future.