EDR ******************************************* ENGLISH CO-OCCURRENCE DICTIONARY Chapter_8 The_English_Co-occurrence_Dictionary The English Co-occurrence Dictionary is composed of the alphabetically arranged notations of headphrases. The phrases are the abstracted portions of actual sentences contained in the English Corpus. The results of the parsing analysis of these sentences indicates that the constituents of the sentence have a dependency structure. That is, the constituents have a governing-dependent relation. It is these constituents that form the headphrases of the English Co-occurrence Dictionary. Records of the English Co-occurrence Dictionary are composed of the record number, headword information, co-occurrence constituent information, syntactic information, semantic information, co-occurrence situation information, and management information. The main role of the English Co-occurrence Dictionary is to show actual examples of how autonomous words are appropriately combined based on the co-occurrence situation information obtained from the English Corpus. ==============Structure of English Co-occurrence Dictionary Records============= *1 :Record type and identifier number :Headword of the Co-occurrence Dictionary record (Section 8.1) *2 :Co-occurrence phrase notation :Information regarding the list of morphemes that comprise the co-occurrence phrase (Section 8.2) *3 :Each morpheme that comprises the co- occurrence phrase :Information regarding the syntactic structure of the co-occurrence phrase (Section 8.3) *4 :Syntactic tree that shows the structure of the co-occurrence phrase :Information of the deep-level concept relations (Section 8.4) *5 :Semantic frame that shows concept relations :Information that shows the co- occurrence situation from the English Corpus (Section 8.5) *6 :Occurrence frequency in the English Corpus (Section 8.5.1) ... *7 :Restructured sentence of sentence in English Corpus, based on the co- occurrence relation (Section 8.5.2) :Information for dictionary development management *8 :Management information such as date of creation or record update Structure of Record Number Portion of Co-occurrence Record *1 ______________________ 8-1 EDR ******************************************* ENGLISH CO-OCCURRENCE DICTIONARY ::=ECC ----------------------------- :Seven digit decimal number Structure of Headphrase Portion of Co-occurrence Record *2 _______________________ ::= ::= ::= ----------------------- :Character string notation of words that comprise the co-occurrence phrase :A co-occurrence relation label from Table 8-2 Structure of Sequence of Constituents Portion of Co-occurrence Record *3______________________ ::=。。。 ::='{''}' ::="" | | ::= ::= ::= ::= ::="" ---------- :Number given to constituent based on order of occurrence in the sentence :Notation of morphemes comprising co- occurrence phrase :Base form of morphemes comprising co- occurrence phrase :Part of speech of morphemes comprising co-occurrence phrase :Correspondence relation type between morpheme and concept :(0) regular concept and direct correspondence :(1) part of idiomatic phrase and does not directly correspond :Japanese headword that represents concept :English headword that represents concept 8-2 EDR ******************************************* ENGLISH CO-OCCURRENCE DICTIONARY :Concept explication in Japanese :Concept explication in English :Explanation of concept given when appropriate concept among concepts of the words in the English Word Dictionary does not exist Structure of Syntactic Sub-tree Portion of Co-occurrence Record *4 _____________________________ ::= ::=/ ::=// ::=/ -------------------------- :Constituent number corresponding to governing morpheme :Word notation of governing morpheme :Constituent number corresponding to co- occurrence relation :A co-occurrence relation label from Table 8-2 :Word notation corresponding to relation :Constituent number corresponding to dependent morpheme :Word notation of dependent morpheme Structure of Semantic Sub-frame Portion of Co-occurrence Record *5 __________________________ ::={""|} ::="" |// ::="" |// --------- :Constituent number corresponding to governing morpheme :Concept identifier of the governing morpheme :Word notation of governing morpheme :Relation label that indicates deep- level concept relation :Constituent number corresponding to dependent morpheme 8-3 EDR ******************************************* ENGLISH CO-OCCURRENCE DICTIONARY :Concept identifier of the dependent morpheme :Word notation of dependent morpheme Structure of Frequency Portion of Co-occurrence Record *6 __________________________ ::=; ;; ---------------- :Occurrence frequency of surface level co-occurrence relation :Occurrence frequency including deep level concept relation :Frequency of governing morpheme :Frequency of dependent morpheme Structure of Example Sentence Portion of Co-occurrence Record *7___________________ ::='{'/ '}' ::= (;) 。。。 ::= Structure of Management History Record Portion of Co-occurrence Record *8 _____________________________ ::== | ;= ::= ::= ================================================================================ ===============Example of English Co-occurrence Dictionary Record=============== ECC157145 eaten @d-object lunch { 1 eaten eat VERB 0 3bc6f0 eat 食べる[タベ・ル] "to eat something" 食物をとる } { 2 lunch lunch NOUN 0 3bec74 lunch 昼食[チュウショ ク] "a meal eaten at noon" 昼の食事 } 1/eaten 2/@d-object/"" 2/lunch 8-4 EDR ******************************************* ENGLISH CO-OCCURRENCE DICTIONARY 1/3bc6f0/eaten object 1/3bec74/lunch 3;2;173;65 {003000002264/ have you (eaten) } DATE="95/3/31" ================================================================================ 8.1. Headword Information The headword information of the English Co-occurrence Dictionary consists of the headphrase only. The headphrase is composed of the notation of the governing and dependent morphemes joined by a co-occurrence relation label (Section 8.3). The order of the governing and dependent morpheme notation is determined by the order in which the morphemes appear in the sentence: the governing or dependent morpheme (whichever appeared first in the sentence), the co-occurrence relation label, and the governing or dependent morpheme (whichever appeared last in the sentence). The syntactic relations described in the English Co-occurrence Dictionary include the following: noun subject and verb or adjective predicate; verb and verb governed noun; noun subject and subject complement; objective complement and direct object of verb; noun and adjective or verb modifying noun; adverbial modification; compound noun; noun and nominal classifier; co-modification of the same element by adjectives; co-modification of the same element by adverbs. 8.2. Co-occurrence Constituent Information The co-occurrence constituent information is composed of information relevant to each of the constituents that make up the co-occurrence headphrase. Included in the co-occurrence constituent information is the morpheme notation used in the actual example from which the phrase has been extracted, the part of speech, the stem form of the word, idiom flag and concept information. The morpheme (if any) corresponding to the co-occurrence relation label and the morphemes of both governing and depending elements are given in co-occurrence constituent information. The stem form of the morpheme is given in the stem form notation field. If the morpheme is a number, the morpheme is described in Arabic numerals. If the form of the pronoun changes according to agreement or case the words are treated as separate words. Example of Constituent Information 8-5 EDR ******************************************* ENGLISH CO-OCCURRENCE DICTIONARY ___________________________________________________ Notation of Stem Form Part of Speech Uninflected Portion of Headword___________________________________________ book book NOUN giv give VERB gave give VERB I I PRON my my PRON should____________should____________AUX____________ The parts of speech used in the English Word Dictionary are also used in the English Co-occurrence Dictionary, however they have been regrouped and redefined into fewer categories. The correspondence between the types of parts of speech used in the English Co-occurrence Dictionary and those in the English Word Dictionary is given in Table 8-1. The idiom flag is information which shows the type of correspondence relation between the morpheme and the concept that is given in the semantic information (See section 8.4). There are two correspondence relations: 1 and 0. If there is a direct correspondence between the concept and the morpheme indicated by the constituent number the correspondence relation '0' is given. When the morpheme indicated by the constituent number corresponds to an idiomatic phrase or compound word which includes the indicated morpheme, the correspondence relation '1' is given. In the example given in the beginning of this chapter, the correspondence relation '0' is given to indicate that there is a direct correspondence between the morpheme 'eaten' and the concept provided in the concept information field. The concept information contains the concept identifier, the Japanese and English headconcepts and the Japanese and English concept explications for the concepts that correspond to each of the morphemes of the co-occurrence headphrase. When the morpheme is a word that does not represent a concept, the concept information is left blank. In such cases, double quotes are given in the concept information field. If the corresponding concept to a morpheme is a concept which does not exist for a word within the English Word Dictionary, a supplemental explanation of the concept written in English is given in the concept information field. 8.3. Syntactic Information The syntactic information is a syntactic sub-tree that represents the surface co-occurrence relation(s) included in the co-occurrence headphrase. The co-occurrence relation is described by the combination of the governing morpheme, the co-occurrence relation label and the dependent morpheme. There are four different types of co-occurrence relations described in the English Co-occurrence Dictionary. The four relations are: governing, modifying, determining (of nouns by noun classifiers), and co-modifying (that is, between constituents modifying the same element). In the governing and modifying 8-6 EDR ******************************************* ENGLISH CO-OCCURRENCE DICTIONARY relations, the syntactic head is the governing element while in the determining (of nouns by nouns classifiers) relation, the noun that is being modified is the governing element. Finally, in the co-modifying relation, the element occurring closest to the morpheme being modified is the governing element. The subject and verb relation is described according to the governing relation. There are two different types of co-occurrence relation labels used in the English Co-occurrence Dictionary. The first type of co-occurrence relation label corresponds exactly to the notation of function words such as prepositions. That is, the preposition or other function word itself is the co-occurrence relation label. The second type of co-occurrence relation label is a relation label code that describes syntactic roles such as 'subject' and 'object, modifying relations, and determining relations by nominal classifiers. Though the format of the co-occurrence phrase constituent information field allows for the description of preposition equivalents or other function words composed of more than one word, only single word prepositions and single word function words are used as co-occurrence relation labels in the current English Co-occurrence Dictionary. Table 8-21 shows the co-occurrence relation labels used in the English Co-occurrence Dictionary. Each of the morphemes in the syntactic sub-tree have a constituent number that indicates to which of the constituents in the co-occurrence constituent information it corresponds. However, the constituent number field is blank when the co-occurrence relation does not correspond to a surface morpheme. This is the case with verbal modification by adverbs. Since there is no surface morpheme indicating the relation between the adverb and the verb, the field is left blank. 8.4. Semantic Information The semantic information of the co-occurrence record provides the deep level semantic relations that correspond to the headphrase. Each of the semantic relations is expressed by a concept relation label in a semantic sub- frame. The concept relation labels are the same as the concept relation labels used in the Concept Dictionary. The semantic information is given for the concept relation of the headphrase in the following format: A semantic sub-frame contains the governing and depending morpheme numbers, the concept identifier from the Concept Dictionary of the concepts corresponding to the morphemes and the governing and dependent word notations. In cases where the English Co-occurrence Dictionary record does not contain semantic information, the governing concept constituent, the concept relation label, and the dependent concept constituent fields are left blank. A blank field is indicated by double quotes. 8-7 EDR ******************************************* ENGLISH CO-OCCURRENCE DICTIONARY 8.5. Co-occurrence Situation Information The co-occurrence situation information contains information relevant to the frequency of occurrences and situations for the co-occurrence records in the English Corpus. The co-occurrence situation information is composed of the frequency of the co-occurrence relation and a portion of the actual example sentence from which the co-occurrence relation was taken. 8.5.1. Frequency The co-occurrence frequency is a number which indicates the number of times a co-occurrence relation appears in the English Corpus. There are three different types of co-occurrence frequency descriptions: surface level co- occurrence frequency, co-occurrence constituent frequency, and co-occurrence entry frequency. The surface level co-occurrence frequency indicates the frequency of appearance of the surface level co-occurrence relation of the headphrase. The co-occurrence entry frequency is a frequency which indicates how often the deep level concept relation corresponding to the syntactic and semantic information of the co-occurrence record occurred in the English Corpus. Finally, the co- occurrence constituent frequency indicates how often the morphemes of the governing and dependent elements of the co-occurrence phrase appear in the English Corpus. The calculations for the co-occurrence constituent frequencies are based on the constituents which have the same notation, stem form, part of speech and concept. 8.5.2. Example Sentence The example sentences that make up the second type of information covered in the co-occurrence situation information are reconstructions of the actual sentences from which the co-occurrence relation has been taken. With the co- occurrence relation as its base, the morphemes included in the co-occurrence phrase as well as the neighboring constituents are extracted and used as the example fragment. The example sentence is provided as reference information in order to see how the surrounding morphemes (of negation, aspect, causation, etc.) influence the formation of the co-occurrence relation. In the example sentence of the English Co-occurrence Dictionary, the suffix endings of inflecting words or the auxiliary verb of the predicate of the governing element, are written following the morphemes of the headphrase. The portion corresponding to the governing morpheme is enclosed in parentheses (), and the portion corresponding to the dependent morpheme is enclosed in angle brackets <>. When constituents occur between the elements of the governing and modifying morphemes, the constituents are omitted and an ellipsis mark (...) is used. If the headword is a noun, any articles accompanying the noun in the original sentence are also shown. An asterisk (*) is used to indicate when a character string, such as an adjective, occurs between the article and the noun. 8-8 EDR ******************************************* ENGLISH CO-OCCURRENCE DICTIONARY 8.6. Management Information The management information of the English Co-occurrence Dictionary contains the management history record. The management history record provides information such as date of creation or record update. 8.a Tables 8-1 Part of Speech Assignments 8-2 Co-occurrence Relation Labels 8-9 EDR ******************************************* ENGLISH CO-OCCURRENCE DICTIONARY Table 8-1 Part of Speech Assignments in the English Co-occurrence Dictionary _______________________________________________________________________________ Part of Explanation Corresponding Part of Speech Code _Speech______________________________from_English_Word_Dictionary______________ NOUN Noun Common Noun EN1 Proper Noun EN2 Cardinal Number EN3 Ordinal Number EN4 Classifier EN5 Indefinite Ponoun EP4 PRON Personal Pronoun Personal Ponoun EP1 Demonstrative Ponoun EP3 DEMO Demonstrative Pro- Demonstrative Pronoun EP3 noun INDEF Indefinite Pronoun Indefinite Pronoun EP4 VERB Verb Verb EVE ADJ Adjective Adjective EAJ Indefinite Determiner ET2 ADV Adverb Common Adverb ED5 UNIT Unit Unit EUN PTCL_________Adverbial_Particle_____Adverbial_Particle_______________ED3_______ 8-10 EDR ******************************************* ENGLISH CO-OCCURRENCE DICTIONARY Table 8-2 Co-occurrence Relation Labels _______________________________________________________________________________ Cooccurrence Relation La- Explanation of Cooccurrence Phrases and Example bel___________________________Phrases__________________________________________ (Verb + prepositional phrase, noun + prepositional phrase) into [advancement, into, politics] /(advancement) into / in [enter, in, triumph] /he (enter)ed its capital into / on [observer, on, West Coast] /(observer)s on the / at, with, for, about, ... ... @subject Subject and verb "stud/@subject/planner" /s might (stud)y the matter/ @d-object Subject and direct object (noun phrase) "achiev/@d-object/reliability" /teams (achiev)e better / @d-object(to) Verb and direct object (to-infinitive) "refus/@d-object(to)/reconsider" /he had (refus)ed to reconsider/ @d-object(to_be_done) Verb and direct object (to be + past participle) "lik/@d-object(to_be_done)/dominat" /they (lik)e to be ed/ @d-object(to_be_doing) Verb and direct object (to be -ing) "pretend/@d-object(to_be_doing)/farm" /he hadn't (pretend)ed to be ing/ @d-object(ing) Verb and direct object (-ing) "start/@d-object(ing)/hit" /a team will (start) ting/ @d-object(bare) Verb and direct object (bare infinitive) "help/@d-object(bare)/explain" /this may (help) / @i-object Verb and indirect object (noun phrase) "caus/@i-object/us" /a person has (caus)ed hardship/ @s-complement Verb and subject complement (noun/adjective phrase) "become/@s-complement/oppressive" /a dictatorship (become) / @s-complement(to) Verb and subject complement (to-infinitive phrase) "hesitat/@s-complement(to)/presecut" /local police have (hesitat)ed to e/ @s-complement(to_be_done) Verb and subject complement (to be + past partici- ple) "seem/@s-complement(to_be_done)/underestimat" /she (seem)ed to be ed/ @s-complement(to_be_doing) Verb and subject complement (to be -ing) "appear/@s-complement(to_be_doing)/spread" _______________________________/(appear)s_to_be_ing/___________________ 8-11 EDR ******************************************* ENGLISH CO-OCCURRENCE DICTIONARY ______________________________________________________________________________ Cooccurrence Relation La- Explanation of Co-occurrence Phrases and Example bel__________________________Phrases__________________________________________ @s-complement(ing) Verb and subject complement (-ing) "keep/@s-complement (ing)/com" /the waves may (keep) ing/ @s-complement(pp) Verb and subject complement (past participle) "become/@s-complement (pp)/recogniz" /the plan has (become) ed/ @o-complement Verb and objective complement (noun/adjective phrase) "made/@o-complement/eas" /(made) it ier/ @o-complement(to) Verb and objective complement (to-infinitive phrase) "permit/@o-complement (to)/seek" /Philadelphia (permit)ted him to / @o-complement Verb and objective complement (to be + past parti- (to_be_done) ciple) "allow/@o-complement (to_be_done)/publish" /he would not (allow) them to be ed/ @o-complement Verb and objective complement (to be -ing) (to_be_doing) "caus/@o-complement (to_be_doing)/analyz" /(caus)ed him to be ing/ @o-complement(ing) Verb and objective complement (-ing) "watch/@o-complement (ing)/handl" /I (watch) Castro ing/ @o-complement(bare) Verb and objective complement (bare infinitive) "watch/@o-complement (bare)/sprout" /(watch)ed them / @o-complement(pp) Verb and objective complement (past participle) "keep/@o-complement (pp)/press" /they (keep) their feet ed/ @passive-subj Passive verb and subject "accept/@passive-subj/profession" /this (profession) was ed/ @passive-object Passive verb and object "told/@passive-object/story" /we're (told) a / @passive-complement Passive verb and complement "consider/@passive-complement/important" /lemon is (consider)ed / @passive-complement(to) Passive verb and complement (to-infinitive phrase) "authoriz/@passive-complement (to)/adopt" /the town was (authoriz)ed to / @passive-complement Passive verb and complement (to be + past partici- (to_be_done) ple) "expect/@passive-complement (to_be_done)/report" /it is (expect)ed to be ed/ @passive-complement Passive verb and complement (to be -ing) (to_be_doing) 8-12 EDR ******************************************* ENGLISH CO-OCCURRENCE DICTIONARY "found/@passive-complement (to_be_doing)/wear" /he was (found) to be ing/ @passive-by Passive verb and agentive subject "translat/@passive-by/people" /this viewpoint has been (translat)ed by / @pre-modifier Modified and modifying (pre-modifier) Noun and adjective "help/@pre-modifier/great" /(great)est / Verb and adverb "follow/@pre-modifier/thereupon" / (follow)ed/ @post-modifier Modified and modifying (post-modifier) Noun and adjective "bond/@post-modifier/repayable" /(bond) / Verb and adverb or adverb-equivalent noun "speak/@post-modifier/softly" /(speak)ing / "talk/@post-modifier/today" /(talk) / @unit Noun and classifier "sheet/@unit/paper" /a of (paper)/ @pred-subj Subject and adjective or adverbial particle (in predicate) (Subject is the dependent word.) "conservative/@pred-subj/he" / was (conservative)/ Subject and noun (in predicate) (Subject is the dependent word.) "traged/@pred-subj/twist" /such would be a (traged)y/ @composite Noun and modifying noun (First noun is the dependent word.) "area/@composite/parking" /(parking) s/ @co-modifier Consecutive/multiple modifiers (First modifier is the dependent word.) Adjective + adjective "political/@co-modifier/unstable" / (political) situation/ Adverb + adverb "heavily/@co-modifier/here" _______________________________/it_rained_(here)_/___________________ 8-13