EDR ****************************************** JAPANESE CO-OCCURRENCE DICTIONARY Chapter_7 The_Japanese_Co-occurrence_Dictionary The Japanese Co-occurrence Dictionary is composed of headphrase notations arranged according to the Japanese syllabary. The phrases are the abstracted portions of actual sentences contained in the Japanese Corpus. The results of the parsing analysis of these sentences indicates that the constituents of the sentence have a dependency structure. That is, the constituents have a governing-dependent relation. It is these constituents that form the headphrases of the Japanese Co-occurrence Dictionary. Records in the Japanese Co-occurrence Dictionary are composed of the record number, headword information, co-occurrence constituent information, syntactic information, semantic information, co-occurrence situation information, and management information. The main role of the Japanese Co-occurrence Dictionary is to show actual examples of how autonomous words are appropriately combined based on the co-occurrence situation information obtained from the Japanese Corpus. Included in the Japanese Co-occurrence Dictionary is another dictionary, the Dictionary of Selectional Restrictions For Japanese Verbs. This dictionary provides information relevant to surface level and deep level case particles for basic Japanese verbs. ============Structure of Japanese Co-occurrence Dictionary Records============== *1 :Record type and identifier number :Headword of the Co-occurrence Dictionary record (Section 7.1) *2 :Headphrase notation :Information regarding the list of morphemes that comprise the co-occurrence phrase (Section 7.2) *3 :Each morpheme that comprises the co- occurrence phrase :Information regarding the syntactic structure of the co-occurrence phrase (Section 7.3) *4 :Syntactic tree that shows the structure of the co-occurrence phrase :Information that shows the deep-level concept relations (Section 7.4) *5 :Semantic frame information that shows concept relations :Information that shows the co-occurrence situation from the Japanese Corpus (Section 7.5) *6 :Occurrence frequency in the Japanese Corpus (Section 7.5.1) *7 :Restructured sentence of the sentence occurring in the Japanese Corpus based on the co-occurrence relation (Section 7.5.2) :Information for dictionary development 7-1 EDR ****************************************** JAPANESE CO-OCCURRENCE DICTIONARY management *8 :Management information such as date of creation or record update Structure of Record Number Portion of Co-occurrence Record *1_____________________________ ::=JCC ----------------------------- :Seven digit decimal number Structure of Headphrase Portion of Co-occurrence Record *2___________________________ ::= ::= ::= ----------------------- :Character string notation of words that comprise the co-occurrence phrase :A co-occurrence relation label from Table 7-2 Structure of Sequence of Constituents Portion of Co-occurrence Record *3______________________ ::=。。。 ::='{' '}' ::="" | | ::= ::= ::= ::="" ---------- :Number given to constituent based on order of occurrence in the sentence :Notation of morphemes comprising co- occurrence phrase :Kana notation of morphemes comprising co-occurrence phrase :Part of speech of morphemes comprising co-occurrence phrase :Correspondence relation type between morpheme and concept :(0) regular concept and direct correspondence :(1) part of idiomatic phrase and does not directly correspond 7-2 EDR ****************************************** JAPANESE CO-OCCURRENCE DICTIONARY :Japanese headword that represents concept :English headword that represents concept :Concept explication in Japanese :Concept explication in English :Explanation of concept given when appropriate concept among concepts of the words in the Japanese Word Dictionary does not exist Structure of Syntactic Sub-tree Portion of Co-occurrence Record *4 _____________________________ ::= ::=/ ::=// ::=/ -------------------------- :Constituent number corresponding to governing morpheme :Word notation of governing morpheme :Constituent number corresponding to co- occurrence relation :A co-occurrence relation label from Table 7-2 :Word notation corresponding to relation :Constituent number corresponding to dependent morpheme :Word notation of dependent morpheme Structure of Semantic Sub-frame Portion of Co-occurrence Record *5 __________________________ ::={""|} ::="" |// ::="" |// --------- :Constituent number corresponding to governing morpheme :Concept identifier of the governing morpheme :Word notation of governing morpheme 7-3 EDR ****************************************** JAPANESE CO-OCCURRENCE DICTIONARY :Relation label that indicates deep- level concept relation :Constituent number corresponding to dependent morpheme :Concept identifier of the dependent morpheme :Word notation of dependent morpheme Structure of Frequency Portion of Co-occurrence Record *6 __________________________ ::=;;; ---------------- :Occurrence frequency of surface level co-occurrence relation :Occurrence frequency including deep level concept relation :Frequency of governing morpheme :Frequency of dependent morpheme Structure of Example Sentence Portion of Co-occurrence Record *7___________________ ::='{'/'}' ::= (;) 。。。 ::= Structure of Management History Record Portion of Co-occurrence Record *8 _____________________________ ::== | ;= ::= ::= ================================================================================ ===============Example of Japanese Co-occurrence Dictionary Record============== JCC7173641 昼食 を 食べ { 1 昼食 チュウショク 名詞 0 3bec74 lunch 昼食[チュウショク] "a meal eaten at noon" 昼の食事 } { 2 を ヲ 助詞 0 "" } { 3 食べ タベ 動詞 0 3bc6f0 "" 食べる[タベ・ル] "to eat something" 食物をとる } 7-4 EDR ****************************************** JAPANESE CO-OCCURRENCE DICTIONARY 3/食べ 2/を/を 1/昼食 3/3bc6f0/食べ object 1/3bec74/昼食 1;1;488;6 {00050003b57d-8-3/<昼食>を…(食べ)に帰る} DATE="95/3/31" ================================================================================ ================================================================================ JCC5321382 借り @rentai 本 { 1 借り カリ 動詞 0 3cfdb4 borrow 借りる[カリ・ル] "to use a person's property after promising to give it back to the lender" 返す約束 で,他人のものを使う } { 2 た タ 助動詞 0 "" } { 3 本 ホン 名詞 0 0e5097 volume 本[ホン] "Publications" 書籍 } 1/借り 2/@rentai/た 3/本 1/3cfdb4/借り 3/0e5097/本 5;1;562;231 {000600000067-18-15/<借り>た(本)} DATE="95/3/31" ================================================================================ 7-5 EDR ****************************************** JAPANESE CO-OCCURRENCE DICTIONARY 7.1. Headword Information The headword information of the Japanese Co-occurrence Dictionary consists of the headphrase only. The headphrase is composed of the notation of the governing and dependent morphemes joined by a co-occurrence relation label (Section 7.3). The order of the governing and dependent morpheme notation is determined by the order in which the morphemes appear in the sentence: the governing or dependent morpheme (whichever appeared first in the sentence), the co-occurrence relation label, and the governing or dependent morpheme (whichever appeared last in the sentence). There are five different types of dependency described in the Japanese Co- occurrence Dictionary. They are: case relation; adnominal modification of predicate/adnoun and nominal; adnominal modification of substantives joined by the case particle 'no'; adverbial modification of predicate/adverbs and substantive; modifying relation between nominal classifiers and substantive. 7.2. Co-occurrence Constituent Information The co-occurrence constituent information is composed of information relevant to each of the constituents that make up the co-occurrence headphrase. Included in the co-occurrence constituent information is the morpheme notation used in the actual example from which the phrase has been extracted, the part of speech, the Kana notation, idiom flag and concept information. The morpheme (if any) corresponding to the co-occurrence relation label and the morphemes of both governing and depending elements are given in co-occurrence constituent information. The reading of the morpheme is described in Katakana and is given as the Kana notation. When the morpheme is a number, the morpheme is described in Arabic numerals. The parts of speech used in the Japanese Word Dictionary are also used in the Japanese Co-occurrence Dictionary, however they have been regrouped and redefined into fewer categories. The correspondence between the types of parts of speech used in the Japanese Co-occurrence Dictionary and those of the Japanese Word Dictionary is given in Table 7-1. The idiom flag is information which shows the type of correspondence relation between the morpheme and the concept that is given in the semantic information (Section 7.4). There are two correspondence relations: 1 and 0. If there is a direct correspondence between the concept and the morpheme indicated by the constituent number the correspondence relation '0' is given. When the morpheme indicated by the constituent number corresponds to an idiomatic phrase or compound word which includes the indicated morpheme, the correspondence relation '1' is given. In the example given in the beginning of this chapter, the correspondence relation '0' is given to indicate that there is a direct correspondence between the morpheme '昼食' and the concept provided in the concept information field. The concept information contains the concept identifier, the Japanese and English headconcepts and the Japanese and English concept explications for the concepts that correspond to each of the morphemes of the co-occurrence 7-6 EDR ****************************************** JAPANESE CO-OCCURRENCE DICTIONARY headphrase. When the morpheme is a word that does not represent a concept, the concept information is left blank. In such cases, double quotes are given in the concept information field. If the corresponding concept to a morpheme is a concept which does not exist for a word within the Japanese Word Dictionary, a supplemental explanation of the concept written in Japanese is given in the concept information field. 7.3. Syntactic Information The syntactic information is a syntactic sub-tree that represents the surface co-occurrence relation(s) included in the co-occurrence headphrase. The co-occurrence relation is described by the combination of the governing morpheme, the co-occurrence relation label and the dependent morpheme. There are four different types of co-occurrence relations described in the Japanese Co-occurrence Dictionary. The four different relations are: case relation, adnominal modification, adverbial modification, and modification of nominals by nominal classifiers. Adnominal modification by the case particle 'no', and adnominal modification by particles other than 'ni' and 'de' are described as case relations. There are two different types of co-occurrence relation labels used in the Japanese Co-occurrence Dictionary. The first type of co-occurrence relation label corresponds exactly to the notation of function words such as particles. That is, the particle or other function word itself is the co-occurrence relation label. The second type of co-occurrence relation label, is a relation label code which indicates one of the following modifying relations: adnominal modification by the predicate, adverbial modification, modification of nominals by nominal classifiers. For function words such as particle equivalents, the relation label is made up of the combination of particles. Table 7-2 shows the co-occurrence relation labels used in the Japanese Co- occurrence Dictionary. Each of the morphemes in the syntactic sub-tree have a constituent number that indicates to which of the constituents in the co-occurrence constituent information it corresponds. However, the constituent number field is blank when the co-occurrence relation does not correspond to a surface morpheme. This is the case with verbal modification by adverbs. Since there is no surface morpheme indicating the relation between the adverb and the verb, the field is left blank. 7.4. Semantic Information The semantic information of the co-occurrence record provides the deep level semantic relations that correspond to the headphrase. Each of the semantic relations is expressed by a concept relation label in a semantic sub-frame. The concept relation labels are the same as the concept relation labels used in the Concept Dictionary. 7-7 EDR ****************************************** JAPANESE CO-OCCURRENCE DICTIONARY The semantic information is given for the concept relation of the headphrase in the following format: A semantic sub-frame contains the governing and depending morpheme numbers, the concept identifier from the Concept Dictionary of the concepts corresponding to the morphemes and the governing and dependent word notations. In cases where the Japanese Co-occurrence Dictionary record does not contain semantic information, the governing concept constituent, the concept relation label, and the dependent concept constituent fields are left blank. A blank field is indicated by double quotes. 7.5. Co-occurrence Situation Information The co-occurrence situation information contains information relevant to the frequency of occurrences and situations for the co-occurrence records in the Japanese Corpus. The co-occurrence situation information is composed of the frequency of the co-occurrence relation and a portion of the actual example sentence from which the co-occurrence relation was taken. 7.5.1. Frequency The co-occurrence frequency is a number which indicates the number of times a co-occurrence relation appears in the Japanese Corpus. There are three different types of co-occurrence frequency descriptions: surface level co- occurrence frequency, co-occurrence constituent frequency, and co-occurrence entry frequency. The surface level co-occurrence frequency indicates the frequency of appearance of the surface level co-occurrence relation of the headphrase. The co-occurrence entry frequency is a frequency which indicates how often the deep level concept relation corresponding to the syntactic and semantic information of the co-occurrence record occurred in the Japanese Corpus. Finally, the co- occurrence constituent frequency indicates how often the morphemes of the governing and dependent elements of the co-occurrence phrase appear in the Japanese Corpus. The calculations for the co-occurrence constituent frequencies are based on the constituents which have the same notation, Kana notation, part of speech and concept. 7.5.2. Example Sentence The example sentences that make up the second type of information covered in the co-occurrence situation information are reconstructions of the actual sentences from which the co-occurrence relation has been taken. With the co- occurrence relation as its base, the morphemes included in the co-occurrence phrase as well as the neighboring constituents are extracted and used as the example fragment. The example sentence is provided as reference information in order to see how the surrounding morphemes (of negation, aspect, causation, 7-8 EDR ****************************************** JAPANESE CO-OCCURRENCE DICTIONARY etc.) influence the formation of the co-occurrence relation. In the examples sentence of the Japanese Co-occurrence Dictionary, the sequence of words occurring just after the predicate that belong to the predicate of the governing side, are written following the morphemes of the headphrase. The portion corresponding to the governing morpheme is enclosed in parentheses (), and the portion corresponding to the dependent morpheme is enclosed in angle brackets <>. When constituents occur between the elements of the governing and modifying morphemes, the constituents are omitted and an ellipsis mark (...) is used. 7.6. Management Information The management information of the English Co-occurrence Dictionary contains the management history record. The management history record provides information such as date of creation or record update. 7.7. Dictionary of Selectional Restrictions for Japanese Verbs The Dictionary of Selectional Restrictions for Japanese Verbs is a dictionary that provides various information regarding case for basic Japanese verbs. That is, it describes, for each concept of a verb, the group of possible co-occurrence surface-level case particles, the types of deep-level case (concept relation label) that correspond to the surface-level case as well as the range of possible concepts that may fill the deep-level case. The main role of the Dictionary of Selectional Restrictions For Japanese Verbs is to help in the selection of the most appropriate concept from among the number of concepts of a noun that co-occurs with a verb for semantic analysis. ===========Structure of Records in the Dictionary of Selectional Restrictions For Japanese Verbs============================== *1 :Record type and identifier number :Sentence showing co-occurrence pattern (Section 7.7.1) :Sequence of constituents showing sentence pattern structure (Section 7.7.2) 。。。 *2 :Constituents that make up the sentence pattern :Example sentence that fits sentence pattern (Section 7.7.3) :Groups of deep-level case and surface- level case particles that may co-occur with verb (Section 7.7.4) 。。。 *3 :Deep-level and surface-level case particles that may occur with verb :Semantic information (composed of groups of possible concepts) for verb and case filler (Section 7.7.5) 7-9 EDR ****************************************** JAPANESE CO-OCCURRENCE DICTIONARY 。。。*4 :Semantic information for each verb and each case filler :Information for dictionary development management ... *5 :Management information such as date of creation or record update Structure of Record Number Portion of Record *1 ______________________ ::=JCP ------------ :Seven digit decimal number Structure of Constituent Information of Record *2 ______________________ ::= -------------- :Number given to constituent based on order of occurrence in the sentence :Notation of morpheme comprising sentence pattern :Part of speech of morpheme comprising sentence pattern :Semantic Information of morpheme comprising sentence pattern Structure of Syntactic Information Portion of Record *3 ______________________ ::= ------------ :Concept relation label that shows deep- level case :Surface level case particle Structure of Semantic Information Portion of Record *4 ______________________ ::= ::=(;)。。。 ::=(;)。。。 ------------ :Concept relation label that indicates case :Concept identifier that indicates case filler :Concept explication that describes 7-10 EDR ****************************************** JAPANESE CO-OCCURRENCE DICTIONARY case filler Structure of Management Information Portion of Record *5______________________ ::=(;)。。。 ::== ::= ::= ================================================================================ ===========Example of Record From Dictionary of Selectional Restrictions For Japanese Verbs============================== JCP0012345 :にあしらう 1 2 が agent 3 4 を object 5 6 に goal 7 あしらう Verb 0e3036取り合わせる :/家元/が/松/の/根元/に/菊/を/あしら/う/ agent が object を goal に act 0e3036 取り合わせる agent 30f6b0;30f746 人間;組織 object 30f6ae;444b1a 具体物;具体的あるいは抽象的生産物 goal 30f6ae;444b1a;3aa938 具体物;具体的あるいは抽象的生産 物;場所 DATE="95/3/31" ================================================================================ 7.7.1. Sentence Pattern Headword The sentence pattern headword of the Dictionary of Selectional Restrictions For Japanese Verbs is expressed by describing the co-occurrence pattern corresponding to a single concept of a verb in the form of a sentence. A word marker is inserted in the place of the case filler (usually a noun) that precedes each of the case particles. The following format is used to express the 7-11 EDR ****************************************** JAPANESE CO-OCCURRENCE DICTIONARY word markers: ,, ...., The co-occurrence patterns in the Dictionary of Selectional Restrictions For Japanese Verbs is based on those case particles which are obligatory and not on case particles that are optional or that can be optionally omitted. Obligatory cases are the surface-case particles registered in the Japanese Word Dictionary. Optional or optionally dropped case are not registered. For information regarding surface-level case, refer to the explanation in the Japanese Word Dictionary. 7.7.2. Syntactic Constituent Information The syntactic constituent information contains information regarding each morpheme the comprises the sentence pattern headword mentioned in the previous paragraph. Each morpheme is described with grammatical information and semantic information. If the constituent is a verb, the word marker 'Verb' is given for the grammatical information. The semantic information given for a verb is the concept identifier and concept explication of the concept that expresses the word sense of the verb. If the constituent is a particle, the concept relation label that indicates the deep-level case is given as the semantic information. For the definition and type of concept relation labels, refer to the explanation given in the Concept Dictionary. 7.7.3. Example Sentence The example sentence given in each record is an example sentences that best fits the sentence pattern headword. The example sentence is described in a format that shows the constituents of the sentence segmented morphologically. 7.7.4. Syntactic Structure Information The syntactic structure information provides information regarding the correspondence between the surface-level and deep-level cases of the sentence pattern headword. The correspondence between the deep-level case and the surface-level case is described in a case frame format. The syntactic information is the pair composed of the concept relation label that expresses the deep-level case, and the case particle that expresses the surface-level case. 7.7.5. Semantic Information The semantic information gives information for the verb being described as well as for each of the case fillers in the case frame described in the syntactic information. For verbs, the concept identifier and the concept explication of the concept that expresses the word sense of the verb are given. The information given for each of the case particles is the concept relation label that indicates the type of case, and the concept group (from the concept 7-12 EDR ****************************************** JAPANESE CO-OCCURRENCE DICTIONARY classification) that expresses the range of possible concepts for the case filler. For example, the sub-concepts 30f6b0 'human' and 30f746 'organization' are given as part of the semantic constituent information, [agent 30f6b0 'human';30f746 'organization']. The possible concepts that may be selected for the agent case filler for the verb concept is either a sub-concept of 30f6b0 'human' or a sub-concept of 30f746 'organization'. The semicolon(;) indicates that the sub-concepts of either one of the two concepts listed are possible concepts for the agent case filler. The semantic constituent information, [object 30f6ae 'concrete object'-30f6b0 'human '] has the following interpretation: the concepts that may be selected as the object case filler for the verb concept is a sub-concept of 30f6ae 'concrete object' but the concept that is selected may not be the sub-concept 30f6b0 'human'. The minus symbol (-) preceding the concept identifier 30f6b0 indicates that the concept from the concept group is not a possible candidate for the case filler. However, there may be case s in which a sub-concept of a concept group marked with a minus (-) symbol may actually be a possible candidate as the concept for the case filler. For such cases, a plus symbol (+) is used. The plus symbol indicates the sub- concept(s) of the minus-marked group may in fact may be selected as a possible case filler. The possible filler concept groups are described by the combination of concepts in the concept classification. The selection of the concepts (from among the concept classification) as possible candidates for the case filler was carried out by human coders only. The data in the Dictionary of Selectional Restrictions For Japanese Verbs was not extracted from either the Japanese Co-occurrence Dictionary or the Japanese Corpus. 7.7.6. Management Information The management information of the English Co-occurrence Dictionary contains the management history record. The management history record provides information such as date of creation or record update. 7.a Tables 7-1 Part of Speech Assignments 7-2 Co-occurrence Relation Labels 7-13 EDR ****************************************** JAPANESE CO-OCCURRENCE DICTIONARY Table 7-1 Part of Speech Assignments in the Japanese Co-Occurrence Dictionary _____________________________________________________________________________ Part of Speech Corresponding Part of Speech Code __________________________________from_Japanese_Word_Dictionary______________ Noun Common Noun JN1 Proper Noun JN2 Numeral JN3 Temporal Noun JN4 Verb Verb JVE Adjective Adjective JAJ Adjectival Noun Adjectival Noun JAM Adverb Adverb JD1 Epistemic Adverb JD2 Affix____________________________Counter_Suffix____________________JN6_______ 7-14 EDR ****************************************** JAPANESE CO-OCCURRENCE DICTIONARY Table 7-2 Co-occurrence Relation Labels _______________________________________________________________________________ Co- Explanation of Co-occurrence Phrases and Example Phrases occurrence Relation Label__________________________________________________________________________ φ Direct modification without a relation label (Noun, adverb modifying a predicate or nominal) [情報,φ,提供] /<情報>(提供)を開始した。/ (... began providing informa- tion.) [単に,φ,提供] /<単に>…(提供)するだけでなく/ (not only providing ..., but) が Dependency relation with が ga [会長,が,あいさつ] /<会長>が(あいさつ)した。/ (Chairman gave an address.) の Dependency relation with の no [庭,の,花] /<庭>の(花)が/ (Flowers in the garden ...) を Dependency relation with を o [波紋,を,投げかけ] /<波紋>を(投げかけ)た。/ (... created a sensation.) に Dependency relation with に ni [健康,に,い] /<健康>に(い)い/ (good for health) で Dependency relation with で de [自宅,で,過ご] /<自宅>で(過ご)す/ (to spend one's time at home) には Dependency relation with には ni-wa [寒さ,には,弱] /<寒さ>には(弱)い。/ (to be susceptible to the cold) への Dependency relation with への e-no [未来,への,展望] /<未来>への(展望)を/ (a prospect of the future) でも Dependency relation with でも de-mo [大学生,でも,解け] /<大学生>でも(解け)ない/ (even a college student cannot solve) について Dependency relation with について ni-tsu-i-te [弊害,について,質問] /<弊害>について(質問)する/ (to ask a question about evil _________________________________________________influence)____________________ は wa, へ e, から ka-ra, まで ma-de, において ni-o-i-te, etc. (Combination of particles, paricle equivalents, auxiliary verbs, auxiliary verb equivalents, verbal auxiliaries) 7-15 EDR ****************************************** JAPANESE CO-OCCURRENCE DICTIONARY __________________________________________________________________________________ Co- Explanation, and examples of cooccurrence relation and example phrase occurrence relation label_____________________________________________________________________________ @rentai Predicate modifying nominal (Adjective, adjectival noun or verb modifying a noun) [提案, @rentai, 拒否] /[拒否]できない[提案]を/ (a proposal that can not be rejected) @renyou Predicate modifying another predicate (Adjective or adjectival noun modifying a verb or adjectival noun) [おこな,@renyou,頻繁] /[頻繁]に[おこな]われる/ ((something) that is often performed) @unit A noun and counter suffix set [貨車,@unit,台] /5[台]の[貨車]/ ____________________________________________________(five_freight_cars)___________ 7-16