Linguistic information in word embeddings

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

We study the presence of linguistically motivated information in the word embeddings generated with statistical methods. The nominal aspects of uter/neuter, common/proper, and count/mass in Swedish are selected to represent respectively grammatical, semantic, and mixed types of nominal categories within languages. Our results indicate that typical grammatical and semantic features are easily captured by word embeddings. The classification of semantic features required significantly less neurons than grammatical features in our experiments based on a single layer feed-forward neural network. However, semantic features also generated higher entropy in the classification output despite its high accuracy. Furthermore, the count/mass distinction resulted in difficulties to the model, even though the quantity of neurons was almost tuned to its maximum.

Original languageEnglish
Title of host publicationAgents and Artificial Intelligence - 10th International Conference, ICAART 2018, Revised Selected Papers
EditorsJaap van den Herik, Ana Paula Rocha
Number of pages22
Place of PublicationCham
PublisherSpringer Verlag
Publication date2019
Pages492-513
ISBN (Print)9783030054526
DOIs
Publication statusPublished - 2019
Externally publishedYes
Event10th International Conference on Agents and Artificial Intelligence, ICAART 2018 - Funchal, Madeira, Portugal
Duration: 16 Jan 201818 Jan 2018

Conference

Conference10th International Conference on Agents and Artificial Intelligence, ICAART 2018
LandPortugal
ByFunchal, Madeira
Periode16/01/201818/01/2018
SponsorInstitute for Systems and Technologies of Information, Control and Communication (INSTICC)
SeriesLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11352 LNAI
ISSN0302-9743

Bibliographical note

Publisher Copyright:
© Springer Nature Switzerland AG 2019.

    Research areas

  • Neural network, Nominal classification, Swedish, Word embedding

ID: 366048642