ParaNatCom --- Parallel English-Japanese abstract corpus made from Nature Communications articles

Masao Utiyama
Fri Nov 27 16:51:26 JST 2020

About

Please refer readme.txt for details.

Download

Copyright

This dataset is distributed under Creative Commons Attribution 4.0 International License. Please cite this dataset as:

Where this dataset come from?

All articles were extracted from ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/pubmed18n0928.xml.gz It is one of "The PubMed Baseline Repository and Daily Update files." We, here, acknowledge NLM as the source of the data in a clear and conspicuous manner and we do not indicate nor imply that NLM has endorsed this dataset. This dataset was downloaded at 5th Oct, 2018 and thus does not reflect the most current data available from NLM. The files in articles/*.txt are the PubMed articles from Nature Communications ranging from 2014 -- 2017. Nature Communications areticles are published open access under Creative Commons Attribution 4.0 International License. cf. https://www.nature.com/ncomms/about/open-access