ParaNatCom --- Parallel English-Japanese abstract corpus made from Nature Communications articles
Masao Utiyama
Fri Nov 27 16:51:26 JST 2020
About
Please refer readme.txt for details.
Download
Copyright
This dataset is distributed under Creative Commons Attribution 4.0 International License.
Please cite this dataset as:
- Masao Utiyama. "ParaNatCom --- Parallel English-Japanese abstract corpus made from Nature Communications articles". (2019)
Where this dataset come from?
All articles were extracted from
ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/pubmed18n0928.xml.gz
It is one of "The PubMed Baseline Repository and Daily Update
files." We, here, acknowledge NLM as the source of the data in a
clear and conspicuous manner and we do not indicate nor imply
that NLM has endorsed this dataset. This dataset was downloaded
at 5th Oct, 2018 and thus does not reflect the most current data
available from NLM.
The files in articles/*.txt are the PubMed articles from Nature
Communications ranging from 2014 -- 2017. Nature Communications
areticles are published open access under Creative
Commons Attribution 4.0 International License.
cf. https://www.nature.com/ncomms/about/open-access