Meno: | Lejla
|
---|
Priezvisko: | Metohajrová
|
---|
Názov: | Entity Disambiguation using Embeddings
|
---|
Vedúci: | prof. Ing. Igor Farkaą, Dr.
|
---|
Rok: | 2019
|
---|
Blok: | INF
|
---|
Kµúčové slová: | entity disambiguation, entity embeddings, word embeddings, naturallanguage processing, social media
|
---|
Abstrakt: | Short documents, such as posts on social media, lack the amount of context needed for deep learning. Modern systems for entity disambiguation (ED) for short documents often rely on annotated Knowledge Bases, extensive feature engineering and using simple machine learning models. We propose a novel approach to ED for short documents by leveraging representation learning and statistical information about entities from large amounts of unstructured data. The fact that even just our baseline model, that uses empirical probability from large unstructured data to select the top entity candidate, shows competitive performance to previous research is a clear evidence that such information is useful for this task. By experimenting with the current state-of-the-art approaches for ED for long documents, we develop a model that outperforms current systems on all available datasets for ED for short documents. Additionally, we publish a new dataset for ED with short documents based on Twitter posts labeled by earlier research.
|
---|