Zephyrnet Logo

Comprehensive Named Entity Recognition on CORD-19 with Distant or Weak Supervision. (arXiv:2003.12218v1 [cs.CL])

Date:

(Submitted on 27 Mar 2020)

Abstract: We created this CORD-19-NER dataset with comprehensive named entity
recognition (NER) on the COVID-19 Open Research Dataset Challenge (CORD-19)
corpus (2020- 03-13). This CORD-19-NER dataset covers 74 fine-grained named
entity types. It is automatically generated by combining the annotation results
from four sources: (1) pre-trained NER model on 18 general entity types from
Spacy, (2) pre-trained NER model on 18 biomedical entity types from SciSpacy,
(3) knowledge base (KB)-guided NER model on 127 biomedical entity types with
our distantly-supervised NER method, and (4) seed-guided NER model on 8 new
entity types (specifically related to the COVID-19 studies) with our
weakly-supervised NER method. We hope this dataset can help the text mining
community build downstream applications. We also hope this dataset can bring
insights for the COVID- 19 studies, both on the biomedical side and on the
social side.

Submission history

From: Xuan Wang [view email]
[v1]
Fri, 27 Mar 2020 03:35:46 UTC (1,315 KB)

Source: http://arxiv.org/abs/2003.12218

spot_img

Latest Intelligence

spot_img

Chat with us

Hi there! How can I help you?