Linguistic information in word embeddings

Department of Nordic Studies and Linguistics (NorS)

Linguistic information in word embeddings

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Basirat, Ali
Marc Tang

We study the presence of linguistically motivated information in the word embeddings generated with statistical methods. The nominal aspects of uter/neuter, common/proper, and count/mass in Swedish are selected to represent respectively grammatical, semantic, and mixed types of nominal categories within languages. Our results indicate that typical grammatical and semantic features are easily captured by word embeddings. The classification of semantic features required significantly less neurons than grammatical features in our experiments based on a single layer feed-forward neural network. However, semantic features also generated higher entropy in the classification output despite its high accuracy. Furthermore, the count/mass distinction resulted in difficulties to the model, even though the quantity of neurons was almost tuned to its maximum.

Original language	English
Title of host publication	Agents and Artificial Intelligence - 10th International Conference, ICAART 2018, Revised Selected Papers
Editors	Jaap van den Herik, Ana Paula Rocha
Number of pages	22
Place of Publication	Cham
Publisher	Springer Verlag
Publication date	2019
Pages	492-513
ISBN (Print)	9783030054526
DOIs	https://doi.org/10.1007/978-3-030-05453-3_23
Publication status	Published - 2019
Externally published	Yes
Event	10th International Conference on Agents and Artificial Intelligence, ICAART 2018 - Funchal, Madeira, Portugal Duration: 16 Jan 2018 → 18 Jan 2018

Conference

Conference	10th International Conference on Agents and Artificial Intelligence, ICAART 2018
Land	Portugal
By	Funchal, Madeira
Periode	16/01/2018 → 18/01/2018
Sponsor	Institute for Systems and Technologies of Information, Control and Communication (INSTICC)

Series	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11352 LNAI
ISSN	0302-9743

Bibliographical note

Research areas

Neural network, Nominal classification, Swedish, Word embedding

ID: 366048642