EDR ************************************************** EDR ELECTRONIC DICTIONARY Chapter_1 The_EDR_Electronic_Dictionary The EDR Electronic Dictionary is linguistic data the development of which has attempted to combine the information of conventional paper-based dictionaries, thesauri and corpora. Japanese and English are the target languages in the dictionary. The words treated in the dictionary include basic or commonly used words and technical terms from the field of information processing. The EDR Electronic Dictionary is made up the following subdictionaries: the Japanese Word Dictionary, the English Word Dictionary, the Concept Dictionary, the Japanese-English Bilingual Dictionary, the English- Japanese Bilingual Dictionary, the Japanese Co-occurrence Dictionary, the English Co-occurrence Dictionary, the Japanese Corpus, the English Corpus and the Technical Terms Dictionary. The Technical Terms Dictionary covers the field of information processing. Each of the dictionaries shares the same basic design. That is, each of the dictionaries is made up of the records of the dictionary. The overall structure of the EDR Electronic Dictionary is given below. ===================Structure of the EDR Electronic Dictionary=================== : (See 1.1.1, and Chapter 2) ... : (See 1.1.2, and Chapter 3) ... : (See 1.1.3, and Chapter 4) ... ... ... : (See 1.1.4, and Chapter 5) ... : (See 1.1.5, and Chapter 6) ... : (See 1.1.6, and Chapter 7) ... ... : (See 1.1.7, and Chapter 8) ... : (See 1.1.8, and Chapter 9) ... 1-1 EDR ************************************************** EDR ELECTRONIC DICTIONARY : (See 1.1.9, and Chapter 10) ... : (See 1.1.10, and Chapter 11) ================================================================================ 1.1. The Structure of the EDR Electronic Dictionary The subdictionaries that compose the EDR Electronic Dictionary are described briefly below. 1.1.1. The Japanese Word Dictionary The Japanese Word Dictionary is composed of the Japanese word records arranged alphabetically according to the Japanese syllabary. Each record of the Japanese Word Dictionary is composed of the record number, headword information, grammatical information, semantic information, pragmatic and supplementary information and management information. The main role of the Japanese Word Dictionary is to describe the correspondence between the Japanese word and the concept represented by the word and to provide the grammatical information for the word when used with the given meaning. Commonly used word are the subject of the Japanese Word Dictionary. 1.1.2. The English Word Dictionary The English Word Dictionary is composed of the English word records arranged alphabetically. The record of the English Word Dictionary is composed of the record number, headword information, grammatical information, semantic information, pragmatic and supplementary information and management information. The main role of the English Word Dictionary is to describe the correspondence between the English word and the concept represented by the word and to provide the grammatical information for the word when used with the given meaning. Commonly used words are the subject of the English Word Dictionary. 1.1.3. The Concept Dictionary The purpose of the Concept Dictionary is to provide the concepts that are made reference to in the Japanese and English Word Dictionaries, the Japanese- English and English-Japanese Bilingual Dictionaries as well as in the Japanese and English Co-occurrence Dictionaries. The Concept Dictionary is composed of three separate dictionaries: the Headconcept Dictionary, the Concept 1-2 EDR ************************************************** EDR ELECTRONIC DICTIONARY Classification Dictionary and the Concept Description Dictionary. The Headconcept Dictionary gives a description of each concept in words and the Concept Classification Dictionary contains a classification of concepts that have a super-sub relation. The Concept Description Dictionary provides all other information regarding the relation between concepts. 1.1.4. The Japanese-English Bilingual Dictionary The Japanese-English Bilingual Dictionary is composed of the records arranged alphabetically according to the Japanese syllabary. Records of the Japanese-English Bilingual Dictionary are composed of the record number, headword information, grammatical information, semantic information, English correspondence information and management information. The main role of the Japanese-English Bilingual Dictionary is to provide an English correspondence word for Japanese headwords based on the meaning of the headword. 1.1.5. The English-Japanese Bilingual Dictionary The English-Japanese Bilingual Dictionary is composed of the bilingual records arranged alphabetically according to the headword. The record of the English-Japanese Bilingual Dictionary is composed of the record number, headword information, grammatical information, semantic information, Japanese correspondence information and management information. The main role of the English-Japanese Bilingual Dictionary is to provide the Japanese correspondence word for English headwords based on the meaning of the headword. 1.1.6. The Japanese Co-occurrence Dictionary The Japanese Co-occurrence Dictionary is composed of headphrase notations arranged according to the Japanese syllabary. The phrases are the abstracted portions of actual sentences contained in the EDR Japanese Corpus. The results of the parsing analysis of these sentences indicates that the constituents of the sentence have a dependency structure. That is, the constituents have a governing-dependent relation. It is these constituents that form the headphrases of the Japanese Co-occurrence Dictionary. Records in the Japanese Co-occurrence Dictionary are composed of the record number, headword information, co-occurrence constituent information, syntactic information, semantic information, co-occurrence situation information, and management information. The main role of the Japanese Co-occurrence Dictionary is to show actual examples of how autonomous words are appropriately combined based on the co-occurrence situation information obtained from the Japanese Corpus. The Dictionary of Selectional Restrictions For Japanese Verbs is a dictionary that provides various information regarding case for approximately 5,000 basic Japanese verbs. That is, it describes, for each concept of a verb, the group of possible co-occurrence surface-level case particles, the types of deep-level case (concept relation label) that correspond to the surface-level case as well as the range of possible concepts that may fill the deep-level case. The main role of the Dictionary of Selectional Restrictions For Japanese Verbs is to help in the selection of the most appropriate concept from among the number of concepts of a noun that co-occurs with a verb for semantic analysis. 1-3 EDR ************************************************** EDR ELECTRONIC DICTIONARY 1.1.7. The English Co-occurrence Dictionary The English Co-occurrence Dictionary is composed of the alphabetically arranged notations of headphrases. The phrases are the abstracted portions of actual sentences contained in the EDR English Corpus. The results of the parsing analysis of these sentences indicates that the constituents of the sentence have a dependency structure. That is, the constituents have a governing-dependent relation. It is these constituents that form the headphrases of the English Co-occurrence Dictionary. Records of the English Co-occurrence Dictionary are composed of the record number, headword information, co-occurrence constituent information, syntactic information, semantic information, co-occurrence situation information, and management information. The main role of the English Co-occurrence Dictionary is to show actual examples of how autonomous words are appropriately combined based on the co-occurrence situation information obtained from the English Corpus. 1.1.8. The Japanese Corpus The Japanese Corpus is composed of records arranged according to EUC (Extended Unix Code). The records of the Japanese Corpus are composed of the record number, sentence information, constituent information, morpheme information, syntactic information, semantic information and management information. The basic role of the Japanese Corpus is first to identify the sentence constituents of sentences, and then to indicate how the constituents combine to form the semantic, syntactic and morphological structure of the sentence using a large number of actual examples as the source data. 1.1.9. The English Corpus The English Corpus is composed of records arranged alphabetically. The records of the English Corpus are composed of the record number, sentence information, constituent information, morpheme information, syntactic information, semantic information and management information. The basic role of the English Corpus is first to identify the sentence constituents of sentences, and then to indicate how the constituents combine to form the semantic, syntactic and morphological structure of the sentence using a large number of actual examples as the source data. 1.1.10. The Technical Terms Dictionary (Information processing) The Technical Terms Dictionary contains technical terms in English and Japanese from the field of information processing. The Technical Terms Dictionary is composed of the following subdictionaries: the Japanese Technical Terms Dictionary, the English Technical Terms Dictionary, the Japanese-English Bilingual Dictionary of Technical Terms, the English-Japanese Bilingual Dictionary of Technical Terms, the Headconcept Dictionary of Technical Terms, the Concept Classification of Technical Terms, the Japanese Technical Terms Co- occurrence Data, and the English Technical Terms Co-occurrence Data. 1-4 EDR ************************************************** EDR ELECTRONIC DICTIONARY 1.2. Explanation Format A common explanation format is used in all chapters of this manual. The structure of each of the chapters is also the same. The structure of each of the chapters is given below. ==============================Chapter Structure================================= :Dictionary name :Description of record (See 1.2.1 for details of notation used in presentation of record.) . . . . . . . . . :Format of records on CD-ROM shown using an extended Backus-Nour form. All information for a dictionary record is given on one line. (Note: In order to make the records easier to understand, the sample data of some chapters is shown in a format that is slightly different from the CD-ROM version. In such cases, the format is defined before the sample data is presented.) :Actual examples from dictionary data ================================================================================ 1.2.1. Description Format A basic descriptive format has been selected to illustrate the contents of the dictionary. An attempt has been made to select a format that eliminates the possibility of misunderstanding or misinterpretation. Though the format lacks a rigid adherence to formality, it is hoped this format and its conventions will make the data of each of the sub-dictionaries easy to understand. A record is composed of a number of fields. The correspondence between a field and its sub-fields is indicated by indentations in which the name or specifications of the field are given. The role of the field or the description of the contents that compose the field is given to the right of the indentation. This 1-5 EDR ************************************************** EDR ELECTRONIC DICTIONARY description method makes use of a portion of SGML (Standard Generalized Mark-up Language) but it is not the SGML. An explanation of the conventions used in this descriptive format are given below. (1) Fields and Sub-fields Sub-fields that make up a field are indented to the right of the main field. The notation below indicates field is made up fields and . (2) Repeating Fields and Subfields Three dots in succession (...) indicates a field and its sub-fields repeat. The notation: ... has the following interpretation: (3) Repeating Sub-fields Three circles in succession (。。。) indicate a sub-field(s) repeats. The notation: 。。。 has the following interpretation: (4) Repeating Field Three dots in succession on separate lines indicates the field repeats itself. This case is used when there are no sub-fields of the field. The notation: 1-6 EDR ************************************************** EDR ELECTRONIC DICTIONARY has the following interpretation: (5) Explanation of Field Contents A single colon (:) following a field indicates an explanation of the field or the content of the field follows. Reference to tables and section names may be included in parentheses following the explanation. The notation :bbb below is an explanation of what field aaa is. :bbb (6) Values of a Field A specific value of a field (or content of the field) is indicated with the indicated notation below. The notation bbb is an example value of what is contained in aaa. bbb (7) Detailed Explanation The following notation is an expanded portion of the Backus-Nour format and is used to indicate that a field (portion of a record) is further detailed in the section noted by the number given in the angle brackets. Though the notation resembles the BNF notation for parameters and tag fields, this notation is not the same as the BNF parameter and tag field notation. * 1.2.2. Notes Regarding Sample Data in Manual BNF The Backus-Nour Form has been used in the examples and explanations in this manual. The standard Backus-Nour Form includes: representation of metavariables using < >; indicating the division between left and right with ::=, and the use of '|' to indicate a selection between (among) metavariables. In addition to the basic format, the representation in this manual also includes the following: 1) 。。。to indicate a recursive metavariable. The two expression below represent the same thing. ::=。。。 ::= | 2) () are metasymbols to indicate a recursive separator. 3) {}are metasymbols to indicate the grouping of parameters (when there are more than one). The grouped elements are treated as a single metavariable. 1-7 EDR ************************************************** EDR ELECTRONIC DICTIONARY When the above mentioned metasymbols are actually part of the notation of the record, the special meaning of the character is escaped by single quotes (') placed around the character. Comments Comments such as sample sentences, copyright indication or information, etc. are included in the Management Information field only. Data Retrieval The record number of each dictionary record can be used as the keyword when retrieving data. Record Length The entire content of a record is provided on a single line with the tab as the field separator. When the record length exceeds 4 K, the content of the record is given on the next line(s). The notation ¥CR is used to indicate the record continues to the next line. Spaces and Symbols in Variables When a space or symbol (including periods, etc.) is included in a value, the entire field is enclosed in double quotes.Note the following examples: "180-degree" "take off" "/" " α " "A.D." "a science called archaeology" There is one exception to the preceding use of double quotes. Double quotes are not given in the pronunciation field in the English Word Dictionary, where there is only one value and the field contains symbols. Empty Fields In order to indicate blank fields, double quotes ("") are used. ASCII Code Letters that can be expressed in ASCII are expressed in ASCII. Note the following examples: RGBモデル "アークランプという,2点に電流を流す灯" 1-8