The EDR Corpus

The EDR Corpus is a set of corpus records (see a figure below). Each record consists of entry information, sentence constituent information, morphological information, syntactic information, and semantic information. The entry information consists of the head-sentence and source information. The head-sentence shows an example sentence and the source information describes information on the source text of the example sentences. The sentence constituent information shows the result of the information retrieval from the relevant Word Dictionary for each of the constituent elements of the sentence. It describes information similar to the co-occurrence constituent information in the Co-occurrence Dictionary. The morphological information provides the morpheme sequence and a supplementary analysis, where the former shows the result of morphological segmentation, and the latter shows candidates for possible compounds and collocations. The syntactic information provided in the EDR Corpus describes a syntactic tree based on a dependency structure. The semantic information provides data for the concept relation representation which shows relationships among concepts using a frame or a graph.