Alignment of Reuters Corpora

We aligned English sentences in RCV1 and Japanese sentences in RCV2 that are available from NIST. We made this data available to the public under the Permitted Uses:

"Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is not possible to reconstruct the information from these summaries."

Sample

Download

Format of sentence alignment data

How to cite this data

The following article should be cited: instead of the web page you are now reading because this data was created by using this method. Other JE corpora are available from the first author's web site.
Last updated: 2005/11/18