ParaNatCom --- Parallel English-Japanese abstract corpus made from Nature Communications articles
Fri Nov 27 16:51:26 JST 2020
Please refer readme.txt for details.
This dataset is distributed under Creative Commons Attribution 4.0 International License.
Please cite this dataset as:
- Masao Utiyama. "ParaNatCom --- Parallel English-Japanese abstract corpus made from Nature Communications articles". (2019)
Where this dataset come from?
All articles were extracted from
It is one of "The PubMed Baseline Repository and Daily Update
files." We, here, acknowledge NLM as the source of the data in a
clear and conspicuous manner and we do not indicate nor imply
that NLM has endorsed this dataset. This dataset was downloaded
at 5th Oct, 2018 and thus does not reflect the most current data
available from NLM.
The files in articles/*.txt are the PubMed articles from Nature
Communications ranging from 2014 -- 2017. Nature Communications
areticles are published open access under Creative
Commons Attribution 4.0 International License.