ParaNatCom --- Parallel English-Japanese abstract corpus made from Nature Communications articles Masao Utiyama Tue Dec 10 10:31:32 JST 2019 Fri Nov 27 16:46:21 JST 2020 (updated) * Contents readme.txt This file articles Articles from Natures Communications taken from PubMed abstracts Abstract texts from articles abstracts-ja-1 Japanese translations of abstracts abstracts-ja-2 Japanese translations of abstracts abstracts-ja-3 Japanese translations of abstracts * About articles/*.txt The filename of each article is its PMID. For example, articles/29146894.txt is at https://www.ncbi.nlm.nih.gov/pubmed/29146894 All articles were extracted from ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/pubmed18n0928.xml.gz It is one of "The PubMed Baseline Repository and Daily Update files." We, here, acknowledge NLM as the source of the data in a clear and conspicuous manner and we do not indicate nor imply that NLM has endorsed this dataset. This dataset was downloaded at 5th Oct, 2018 and thus does not reflect the most current data available from NLM. In addition, as described in ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/README.txt --- These data are produced with a reasonable standard of care, but NLM makes no warranties express or implied, including no warranty of merchantability or fitness for particular purpose, regarding the accuracy or completeness of the data. Users agree to hold NLM and the U.S. Government harmless from any liability resulting from errors in the data. NLM disclaims any liability for any consequences due to use, misuse, or interpretation of information contained or not contained in the data. --- * Files in articles/*.txt The files in articles/*.txt are the PubMed articles from Nature Communications ranging from 2014 -- 2017. Nature Communications areticles are published open access under CC BY license. (Creative Commons Attribution 4.0 International License: https://creativecommons.org/licenses/by/4.0/) cf. https://www.nature.com/ncomms/about/open-access * About abstracts/*.txt The filename of each article is its PMID. The first line is extracted from . The second line is empty. The third line is extracted from . * About abstracts-ja-1/*.txt, abstracts-ja-2/*.txt and abstracts-ja-3/*.txt The content is a Japanese translation of the corresponding abstracts/PMID.txt The articles in abstracts-ja-1 were translated by a translation agency. Those in abstracts-ja-2 were translated by another translation agency. Those in abstracts-ja-3 were translated by yet another translation agency. Note that about 70% of files in abstract/*.txt were translated in abstracts-ja-2. The original files in abstracts-ja-3 were the same as those in abstracts-ja-2. * Copyright of this dataset This dataset is distributed under CC BY license. Creative Commons Attribution 4.0 International License https://creativecommons.org/licenses/by/4.0/ Please cite this dataset as: Masao Utiyama. "ParaNatCom --- Parallel English-Japanese abstract corpus made from Nature Communications articles". (2019) * Acknowledgment This work is partly supported by JSPS KAKENHI Grant Number 19H05660.