EDR ************************************************************* ENGLISH CORPUS Chapter_10 The_English_Corpus The English Corpus is composed of records arranged alphabetically. The records of the English Corpus are composed of the record number, sentence information, constituent information, morphological information, syntactic information, semantic information and management information. The basic role of the English Corpus is first to identify the sentence constituents of sentences, and then to indicate how the constituents combine to form the morphological, syntactic and semantic structure of the sentence using a large number of actual examples as the source data. ========================Structure of English Corpus Records===================== :Record type and identifier number :(Section 10.1) :Management number for sentence :Name of source from which text is taken :Notation of example sentence :(Section 10.2) :*1 :(Section 10.3) :*2 :(Section 10.4) :*3 :(Section 10.5, and Tables 10-2 - 10-10) :*4 :Information for dictionary development :Management information such as date of creation or record update *1 ::= ¡£¡£¡£ ::= ::= | | ------------------------------------------ :Morpheme number or compound number (Section 10.2.1) :Notation of morpheme or compound word (Section 10.2.2) :Stem form corresponding to notation (Section 10.2.3) :(Section 10.2.5) :(Section 10.2.4 and Table 10-1) :Number that uniquely identifies the concept :Explanation of concept that is given when an appropriate concept is not available among the concepts of the 10-1 EDR ************************************************************* ENGLISH CORPUS words from the English Word Dictionary :Number of the compound word *2 Sequence of Morphemes ::= ¡£¡£¡£/ ::=/: ::='{'¡£¡£¡£/'}' ::= ::=/{ | }¡£¡£¡£/ ::=*/ *3 Syntactic Tree ::= | ::='('')' ::='('¡£¡£¡£')' |'(' ¡£¡£¡£')' | '('Formation Label>¡£¡£¡£')' ::=
| | ----------------------- :Terminal Node :Non-Terminal Node :Index that indicates node is a leaf :Indicates one of four possible formation types (Section 10.4.2)
:Main (most central) Sub-concept node *4 Semantic Frame ::=[¡£¡£¡£] ::=[] | [ Relation Slot Name>¡£¡£¡£] | [¡£¡£¡£] ::= | main | which | attribute | S-attribute ::= : : | ::= | ::= | | ----------------------- :Concept relation label for expressing facts (reality) and events (phenomena) :Attribute label assigned to each constituent :Attribute label assigned to sentence :Number used when concept is added 10-2 EDR ************************************************************* ENGLISH CORPUS ================================================================================ =========================Example of English Corpus Records====================== 0020000026cd Japan Times The United States has singled out the Kansai airport project as an example of the closed nature of the Japanese construction market. 1 the the ART 2dc2f4 2 / / BLNK 2dc2ed 3 United/States United/States NOUN "W the United States of America" 4 / / BLNK 2dc2ed 5 has have AUX 2dc2fd 6 / / BLNK 2dc2ed 7 singl single VT I#1 8 ed ed SUF 2dc2ed 9 / / BLNK 2dc2ed 10 out out PTCL I#1 11 / / BLNK 2dc2ed 12 the the ART 2dc2f4 13 / / BLNK 2dc2ed 14 Kansai Kansai NOUN I#2 15 / / BLNK 2dc2ed 16 airport airport NOUN I#2 17 / / BLNK 2dc2ed 18 project project NOUN I#2 19 / / BLNK 2dc2ed 20 as as PREP 2dc2ef 21 / / BLNK 2dc2ed 22 an an ART 2dc2f3 23 / / BLNK 2dc2ed 24 example example NOUN 0bb9e3 25 / / BLNK 2dc2ed 26 of of PREP 2dc2ef 27 / / BLNK 2dc2ed 28 the the ART 2dc2f4 29 / / BLNK 2dc2ed 30 closed closed ADJ 0b3e1a 31 / / BLNK 2dc2ed 32 nature nature NOUN 0ca429 33 / / BLNK 2dc2ed 34 of of PREP 2dc2ef 35 / / BLNK 2dc2ed 36 the the ART 2dc2f4 37 / / BLNK 2dc2ed 38 Japanese Japanese ADJ 0a8dcf 39 / / BLNK 2dc2ed 40 construction construction NOUN 0b5134 10-3 EDR ************************************************************* ENGLISH CORPUS 41 / / BLNK 2dc2ed 42 market market NOUN 0c7d6d 43 . . PUNC 2dc2e5 I#1 singl/out single/out VT choose, pick, one person or thing fromamong several for special comment, treatment etc. I#2 Kansai/airport/project Kansai/airport/project NOUN a project to build an international airport in the Osaka Bay /1:the/2: /3:United States/4: /5:has/6: /7:singl/8:ed/9: /10:out/11: /12:the /13: /14:Kansai/15: /16:airport/17: /18:project/19: /20:as/21: /22:an/23: /24:example/25: /26:of/27: /28:the/29: /30:closed/31: /32:nature/33: /34:of /35: /36:the/37: /38:Japanese/39: /40:construction/41: /42:market/43:./ {I#1//7:singl/10:out/I#2//14:Kansai/16:airport/18:project//} (S (t (M (S (S (t (W 1 "The")) (W 2 " ")) (t (S (t (W 3 "United States")) (W 4 " ")))) (t (M (t (M (t (S (S (t (W 5 "has")) (W 6 " ")) (t (S (t (W 7 "singled out") (I 1 "singled out" (S (t (S (t (W 7 "singl")) (W 8 "ed"))) (W 9 " ")) (W 10 "out"))) (W 11 " "))))) (S (S (t (W 12 "the")) (W 13 " ")) (t (S (t (W 14 "Kansai airport project") (I 2 "Kansai airport project" (S (t (W 14 "Kansai")) (W 15 " ")) (S (t (W 16 "airport")) (W 17 " ")) (W 18 "project")))) (W 19 " ")))))) (S (S (t (W 20 "as")) (W 21 " ")) (t (M (t (S (S (t (W 22 "an")) (W 23 " ")) (t (S (t (W 24 "example)) (W 25 " ")))) (S (S (t (W 26 "of)) (W 27 " ")) (t (M (t (S (S (t (W 28 "the")) (W 29 " ")) (t (M (S (t (W 30 "closed")) (W 31 " ")) (t (S (t (W 32 "nature")) 10-4 EDR ************************************************************* ENGLISH CORPUS (W 33 " "))))))) (S (S (t (W 34 "of")) (W 35 " ")) (t (S (S (t (W 36 "the")) (W 37 " ")) (t (M (S (t (W 38 "Japanese")) (W 39 " ")) (t (M (S (t (W 40 "construction")) (W 41 " ")) (t (W 42 "market"))))))))))))))))))) (W 43 ".")) [ [main I#1:singled out:"=Z to choose, pick, one person or thing from among several for special comment, treatment etc."] [agent 3:United States:"=W the United States of America"] [object I#2:Kansai airport project:"=Z a project to build an international airport in the Osaka Bay"] [modifier [ [main 24:example:0bb9e3] [modifier [ [main 32:nature:0ca429] [modifier 30:closed:0b3e1a] [modifier [ [main 42:market:0c7d6d] [modifier 38:Japanese:0a8dcf] [modifier 40:construction:0b5143]]]]]]] [attribute already end]] ================================================================================ 10.1. Sentence Information The sentence information of the English Corpus record includes information about the sentence itself. It is composed of the sentence, text number, and source information. The text number is a sentence management number that is given to all of the sentences registered in the EDR Text Base. The sentences stored in the English Corpus have been extracted from source texts contained in the EDR Text Base. The text number is provided in order to be able to extract the context (source) from the EDR Text Base. The source information contains the name of the text source. The sentence is the notation of the example sentence itself. 10.2. Constituent Information The constituent information indicates what constituents the sentence is composed of and gives the information relevant to the morphemes or compound words of the sentence. A morpheme is defined as the smallest linguistic unit that combines with other morphemes to make up a sentence. In general terms, morphemes refer to words, prefixes and suffixes. Idiomatic or set phrases fall into the category of compound words. Included in Constituent Information are: constituent number, notation, stem form, part of speech and selected concept. 10-5 EDR ************************************************************* ENGLISH CORPUS 10.2.1. Constituent Number The constituent number is the number of the morpheme or compound word. The number given to the morpheme or compound word corresponds to the order in which it appears in the sentence. That is, the first constituent of the sentence is given constituent number '1', the second constituent of the sentence is given constituent number '2', etc. The words that make up a compound word are treated as single constituents and given a constituent number. However, at the end of the constituent information block, the compound words are regrouped and given a new constituent number preceded by an "I". The number assigned to the compound word also corresponds to the order in which it appears in the sentence. If there are two compound words, the compound word is given the constituent number I#1, and the compound word occurring next in the sentence is given the constituent number I#2. A compound word is a word composed of two or more words that together express a single concept. The concepts of each of the words that comprises the compound word in isolation cannot express the concept represented by the compound word. In addition, the constituents that make up the compound word do not show independent syntactic functions. Also considered compound words, are idiomatic phrases and words that when affixed point to a concept different from the concept when the word is not affixed. The following types of words are not considered compound words: a phrase consisting of a word and attributive modifier(s) whose concept does not change by the attribute concept; a combination of words with different notations but whose concepts are the same; combinations of words with different concepts such as 'organization name + position name' or 'place name + place name', in which the constituents may be replaced with other words, thus making the number of combination too numerous to register in the dictionary. The morphemes and compound words in the English Corpus correspond to the word entries in the English Word Dictionary. All idiomatic or set phrases and compound words in the English Word Dictionary are further divided at the constituent level as morphemes. Example of Compound Words (Including Idiomatic Phrases) ___________________________________________________________ ______Type_________________________Sequence________________ Compound word average| |value Narita| |Airport national| |flower tak|e| |out by| |chance music| |education by| |way| |of Idiomatic Phrase mak|e| |use| |of ___________________mak|e|_|up|_|one's|_|mind_______________ 10-6 EDR ************************************************************* ENGLISH CORPUS Note: the division between morphemes is indicated by a '|' symbol. 10.2.2. Notation The notation is composed of the character string of the morpheme or compound word. 10.2.3. Stem Form Notation To supplement the morpheme and the compound word notation, the stem form notation is also given. The stem form notation for inflecting words is given with its suffix ending. For words that don't inflect, the stem form notation is the same as the morpheme notation. Example of Stem Form Notation ___________________________________________________________________________ Type Notation Stem Comment (Explanation) Form No- _____________________________tation________________________________________ Regularly In- turn turn flecting Verb Regularly In- hik hike Suffix ending 'e' is added flecting Verb Regularly In- stud study Suffix ending flecting Verb Regularly In- d die Suffix ending flecting Verb Irregularly In- saw see flecting Verb Irregularly In- gone go flecting Verb Irregularly In- wrote write flecting Verb Irregularly In- flew fly flecting Verb Irregularly In- had have flecting Verb Auxiliary Verb can (could) Word itself is used when the could corresponding meaning does not ex- ist in stem form Auxiliary Verb shall (should) Word itself is used when the should corresponding meaning does not ex- ist in stem form Adverb soon soon Adverb more more For cases in which comparative de- gree of adjective or adverb is formed 10-7 EDR ************************************************************* ENGLISH CORPUS Adjective more many or For cases in which comparative de- much gree of noun is formed Preposition of of Conjunction and and Pronoun___________they_______they__________________________________________ For the word form of the morphemes of inflecting words, see the explanation in the English Word Dictionary. Specifically, see Table 3-10 'English Verb Inflection', Table 3-11 'English Noun Inflection', Table 3-12 'English Adjective Inflection', and Table 3-13 'English Adverb Inflection'. 10.2.4. Part of Speech A part of speech assignment is given to all morphemes and compound words. The twenty four parts of speech used in the English Corpus are: noun pronoun demonstrative word of negation question word intransitive verb transitive verb verb Be-verb adjective adverb adverbial particle conjunction prefix suffix article auxiliary verb preposition interjection blank space punctuation symbol symbol unit number Abbreviated names for the aforementioned parts of speech are used in the English Corpus. The abbreviated names are shown in Table 10-1. The parts of speech used in the English Corpus are not as detailed as the parts of speech used in the English Word Dictionary. 10.2.5. Selected Concept The selected concept is provided in order to show what meaning has been used or what kind of concept is being indicated by the morpheme or compound 10-8 EDR ************************************************************* ENGLISH CORPUS word. The selected concept is either a concept identifier, a supplemental explanation of the concept or a compound word number. The concept identifier is given when the morpheme corresponds to a regular concept. Of the concepts that correspond to the morpheme or to the compound word, the one concept that most appropriately corresponds to the meaning as it has been used in the original text is the selected concept. The supplemental explanation of the concept is given in cases where an appropriate concept among the words in the English Word Dictionary corresponding to the morpheme or compound word is not available. There are two different formats in which the supplemental explanations of the concept are given: Z format and W format. The character string notation of the explanation comprises the Z format while the W format is composed of a list of words that are synonymous and/or similar to the meaning of the specified concept. The supplemental explanation of the concept given in a record may be either in W format, Z format or a combination of the two types. When both Z format and W format are used in the same record a slash (/) is used to indicate the division between the two. The Z format supplements the insufficient concept by explaining or describing the concept. In the Z format, the relation between the original word and the specified concept is first described. This is then followed by the concept explication. The headconcept in these cases is precede by a special headconcept symbol, 'c#'. Notation Indicating Relation Between Word and Specified Concept of the Supplemental Explanation of the Concept (Z Format) ___________________________________________________________________________ Symbol Relation Between Original Word and _____________Specified_Concept_____________________________________________ '=' Synonymous Specified concept is the same in meaning as the concept of the word used in the sentence. '<' Narrower in Meaning Specified concept is included in the meaning of the concept of the word used in the sentence. In such cases, the specified concept adds a more detailed meaning to the concept of the word. (Pomeranian < dog) '%' Similar in Meaning Specified concept is not the same as the concept of the word in the sentence but closely resembles it. (Raccoon % Rac- coon dog/BADGER) '>' Broader in Meaning Meaning of concept of the word used in the sentence is included in the meaning of the specified concept. In such cases, the meaning of the word adds a more detailed meaning to the specified ___________________________________concept.__(This_is_not_usually_used.)___ 10-9 EDR ************************************************************* ENGLISH CORPUS Example of Supplemental Explanation of Concept (Z Format) _____________________________________________________________ 1 purit purity NOUN =Z state or quality of being pure 1___y_______y________SUF____2dc2d9___________________________ The W format supplements the insufficient concept by giving one or more words (group of words) or compound words. Included in the W format, is the notation and part of speech of the given word(s) or compound words. Note however, that the part of speech assignment may be omitted when the part of speech of the original word and that of the compound word(s) are the same. When the part of speech assignment is given is is placed to the right of the compound word(s), etc. The part of speech assignment for compound words given below has been left out since the part of speech is the same. Example of Supplemental Explanation of Concept (W Format) ___________________________________________________________________________ 1 United States United States NOUN =W the United States of America 2_________________________________UNIT___=W_yen_mark_______________________ When the morpheme is included in the compound word, the compound word number is given. The compound word number is the number given to the compound word as it occurs in the sentence and is expressed in the format 'I#'. 10.3. Morphological Information The morphological information shows the segmentation of the sentence into morphemes. The constituents of the morpheme sequence are shown first. The compound word sequence(s) is provided after the morpheme sequence. 10.4. Syntactic Information The syntactic information shows the syntactic structure of the sentence. It shows in what way the syntactic constituents combine to comprise the sentence. The syntactic tree used to show the syntactic information is a parsing tree based on dependency structure and is given in list form. 10.4.1. Structure of Syntactic Tree The nodes that comprise the syntactic tree are leaves and intermediate nodes. The main node is assigned to one of intermediate nodes or leaves that constitute the same intermediate node. A leaf is the terminal node that corresponds to each of the morphemes. The leaf is described by a leaf identifier (W). The leaf identifier 'W' precedes the morpheme. 10-10 EDR ************************************************************* ENGLISH CORPUS An intermediate node is a non terminal node that groups leaves with syntactic relations. The intermediate nodes describe the main node and the various formation types. In the description of compound words, the entirety of a compound word is first described in the leaf format as a single leaf. Then, as the intermediate nodes, the internal structure of the leaf is described. The main node is a node among the group of sub-nodes that is regarded central. 10.4.2. Formation Types The term 'formation' as is used in this manual refers to the way in which sub-nodes join or are grouped together. There are four different types of formation, each indicated by a different formation label The type of formation and corresponding formation label are as follows: modification ('M'), synthesis ('S'), number formation ('N'), and compound word formation ('I'). The label indicating the type of formation is given at the head of each of the intermediate nodes. The first type of formation, modification, is composed of the modifying constituent and the constituent that is modified. Included in modification are constituents that show a dependency relation, and also modification by adjectives and adverbs. The main node of this type of formation is the modified constituent. Examples of Formation by Modification -------------------------------------------------- Type Example -------------------------------------------------- Modification by Adjective (M (W 1 "closed") (t (W 2 "nature"))) Modification by Adverb (M (t (W 1 "go")) (W 2 "tomorrow")) -------------------------------------------------- The synthesis formation is created when several sentence constituents are grouped to form a single sentence constituent. Included in this type of formation are constituents grouped through subordination and those grouped through coordination. Subordination is a formation comprised of an autonomous word that has a concept and the subordinating word(s) which does not have a concept. The main node of the formation is the autonomous word. Examples of Synthesis (Subordination) -------------------------------------------------- Type Example -------------------------------------------------- 10-11 EDR ************************************************************* ENGLISH CORPUS Stem of Word and Suffix Ending (S (t (W 1 "bloom")) (W 2 "s")) Noun and Preposition (S (W 1 "in") (t (W 2 "front"))) Article and Noun (S (W 1 "an") (t (W 2 "example"))) Prefix (S (W 1 "semi") (t (W 2 "conscious"))) Punctuation Mark (S (t (W 1 "however")) (W 2 ",")) -------------------------------------------------- Coordination is a formation type used to group a noun (noun phrase) that has a concept and another noun (noun phrase) that also has a concept by a coordinating conjunction. The grouped elements form a single noun phrase. The main node of the formation is the word that occurs closest to the end of the sentence/phrase. Examples of Synthesis (Coordination) -------------------------------------------------- (S (W 1 "A") (W 2 "and") (t (W 3 "B"))) -------------------------------------------------- The compound word formation is the type of formation used to make compound words. This includes idiomatic phrases and set phrases. In the compound word formation, the entirety of the compound word is described in a leaf structure, and then the internal structure of the compound word is described. Example of Compound Word Formations -------------------------------------------------- Example (1) (W 7 "singled out") (I 1 "singled out" (S (t (S (t (W 7 "singl")) (W 8 "ed"))) (W 9 " ")) (W 10 "out")) Example (2) 10-12 EDR ************************************************************* ENGLISH CORPUS (W 14 "Kansai airport project") (I 2 "Kansai airport project" (S (t (W 14 "Kansai")) (W 15 " ")) (S (t (W 16 "airport")) (W 17 " ")) (W 18 "project"))) -------------------------------------------------- The number formation is the type of formation used to make a single formation of several numbers. The number formation is formed in the same way as the synthesis formation. The value is preceded by the formation label N. Example of Number Formations -------------------------------------------------- (N -3 (W 1 "-") (W 2 "3")) -------------------------------------------------- 10.4.3. How Intermediate Nodes Are Formed When there are multiple modifying constituents, in principle the syntactic tree is described so that the nodes do no cross each other. This is done by describing the innermost node first, and then the next innermost node, etc. until all the nodes are described. In the first example below, the first formation described is 'Plays pingpong' and the second formation described is 'Taro plays'. In the second example below, 'Taro plays' is first described. The second formation described is 'Pingpong, plays'. Example of Formation of Intermediate Node -------------------------------------------------- Character String Notation Syntactic Tree -------------------------------------------------- "Taro plays pingpong" (M (S (t (W 1 "Taro")) (W 2 " ")) (t (M (t (S (t (S (t (W 3 "play")) (W 4 "s"))) (W 5 " "))) (W 6 "pingpong")))) "Pingpong, Taro plays" (M (S (t (W 1 "Pingpong")) (W 2 ",") (W 3 " ")) (t (M (S (t (W 4 "Taro")) (W 5 " ")) (t (S (t (W 6 "play")) (W 7 "s")))))) -------------------------------------------------- 10-13 EDR ************************************************************* ENGLISH CORPUS In cases where several consecutive nodes should be unified, and it is determined that there is no significant difference from which node the formation is made, three or more nodes are either unified at the same time, or are first made into subordinating formations and then coordinating formations. Example of Three of More Nodes Unified At the Same Time -------------------------------------------------- Type of Formation Syntactic Tree -------------------------------------------------- Unified At Same Time (S (t (W 1 "Pingpong")) (W 2 ",") (W 3 " ")) Not Unified At Same Time (S (t (S (t (W 1 "Pingpong")) (W 2 ","))) (W 3 " ")) -------------------------------------------------- The syntactic tree for adjectives that take case particles and for some other parts of speech, there is a possibility of crossing nodes. In order to avoid such intersections, the syntactic tree is described by changing the order of the leaves. Since information regarding the position of each of the constituents is contained in the S Format, it is possible to restore the word order of the original sentence. The leaf order of the changed leaves is determined by the first leaf that is described from the main node with the smallest morpheme number. The phrase 'describe applicable rules to people' becomes 'describe rules applicable to people' in order to avoid crossing. Example of Change in Leaf Order -------------------------------------------------- Example: describe applicable rules to people (M (t (S (t (S (t (W 3 "describ")) (W 4 "e"))) (W 5 " "))) (M (t (S (t (S (t (W 8 "rule")) (W 9 "s"))) (W 10 " "))) (M (t (S (t (W 6 "applicable")) (W 7 " "))) (S (S (t (W 11 "to")) (W 12 " ")) (t (W 13 "people")))))) -------------------------------------------------- 10-14 EDR ************************************************************* ENGLISH CORPUS 10.5. Semantic Information The semantic information gives information regarding the semantic structure of the sentence. It shows how the concepts of the words used in the sentence are joined to form the overall content structure of the sentence. This is provided by way of a 'concept relation representation'. The concept relation representation is given in a frame format. 10.5.1. Structure of Semantic Information The semantic information is a semantic frame in which the relations between the concept of the predicate and the other concepts in the sentence are listed. The first slot of the frame is the concept that makes up the predicate of the sentence. The first slot indicates the main concept and is given the slot name 'main'. The remaining slots of the semantic frame are composed of the other concepts. Each slot shows the content of the concepts and gives the semantic relation it has to the main concept. The slots of the frame following the main slot indicate what attribute (modifying or determining) the concept of the slot has in relation to the main concept. Example of Semantic Frame (Model) -------------------------------------------------- Example Sentence: He may have written a letter. [ [main 7:write:0e910d] [agent 1:he:2dc304] [object 11:letter:3d0797] [attribute already end] [S-attribute seem]] -------------------------------------------------- The slot names used in the semantic frame are 'main', 'attribute', 'S- attribute' 'which' and the concept relation labels that make up Table 10-2. The 'attribute', 'S-attribute' and 'which' slots are special slot names. The attribute slot labels (Table 10-3 - 10-7) are applicable to individual constituents of the sentence and indicate information such as the viewpoint of the speaker. The S-attribute labels (Table 10-8) are applicable to the entire sentence and indicate information such as the viewpoint of the speaker. The which slot is used to describe an embedded sentence that modifies a noun. The concept of the noun being modified is given the slot name 'main' and the embedded sentence is then described in the which slot. An example of the notation used in the which slot is given below. Example of Semantic Frame (which Slot) -------------------------------------------------- Example: ...a letter he wrote [ [main 3:letter:3d0797] [which [ [main 7:write:0e910d] 10-15 EDR ************************************************************* ENGLISH CORPUS [agent 5:he:2dc304] [object 3:letter:3d0797] [attribute already end]]]] -------------------------------------------------- 10.5.2. Concept Relation Representation in the English Corpus The concept relation representation describes the semantic structure of a sentence written in natural language. The concept relation representation is composed of the concept that corresponds to the words in the sentence and a concept relation label. The concept relation representations in the English Corpus provide information for the following: 1) facts (reality) and events (phenomena); 2) the viewpoint of the speaker; 3) the intention of the utterance, and the speaker's feeling, or judgment regarding the content of the utterance; 4) the structure of a piece of writing. Facts and phenomena expressed in a sentence are described by a combination of the concept, the concept relation label, and the concept attribute label. The concept relation label is a label that expresses facts and phenomena. The relation labels include 'agent', 'object', 'implement' and other labels that indicate a relation between event concepts and thing concepts. Also included are event concept labels such as 'condition', 'sequence' and pseudo-relation labels such as 'possessor'. The concept attribute label is a label that indicates the authenticity of an event, or the amount of something or other similar fact. The concept attribute label 'not' is used to indicate negation. For the full list of concept attribute labels refer to Table 10-3. Information in sentences indicating the viewpoint of the speaker towards events or facts is described in either the constituent attribute slot (attribute) or the sentence attribute slot (S-attribute). Time indicators such as present, past, future, aspect, and mood, etc., are described in either of the aforementioned slots. The difference between the constituent attribute slot and the sentence attribute slot is in the scope of the described attribute. The attribute labels of the constituent attribute slot indicate the attributes of each of the individual constituent concepts that comprise the sentence. This includes tense, aspect and emphasis. The attributes of the sentence attribute slot indicate attributes regarding the sentence as a whole. This includes sentence type such as command, question, and conjecture. Tense is described in the constituent attribute slot with the following attribute labels (Table 10-4): past, present, future. Aspect which is information that indicates the viewpoint of the speaker regarding the progress and condition of an event or fact from a particular point in time is described in the constituent attributes slot. Tables 10-5 and 10-6 show the aspect attribute labels. 10-16 EDR ************************************************************* ENGLISH CORPUS If information regarding the speaker's thoughts or feeling towards facts or events described in the sentence as well as the intention of sentence is indicated by auxiliary verbs, or by re-wording and style of sentence, etc. an attribute label indicating the speaker's state and attitude when the sentence was produced is given. (Tables 10-7 - 10-8) For sentences in which a subjective description of the speaker's intention and judgment are described, a special headconcept code is given. (Table 10-9). 10.5.3. Addition of Concepts When the concept of a sentence is not represented by the words of the sentence and the lack of the concept prevents the creation of the concept relation representation, the omitted concept must be added. Concepts are added by selecting the headconcept thought to be the most appropriate from Table 10- 10. If an appropriate headconcept does not exist the code 'c#nil' meaning 'a concept of some kind' is used in the description. When a concept is added, an added concept number is assigned and that number is made the constituent number. The added concept number is indicated in the following format: '@'. 10.6. Management Information The management information of the English Corpus contains the management history record. The management history record provides information such as date of creation or record update. 10.a Tables 10-1 Part of Speech Assignments 10-2 Relation Labels For The Representation of Facts or Occurrences 10-3 Labels Used To Represent Attributes of Facts or Occurrences 10-4 Labels Used To Indicate Time Reference 10-5 Labels Used To Indicate Aspect 10-6 Aspect in English 10-7 Constituent Labels Used To Indicate Speaker's Intention 10-8 Sentence Attribute Labels Used To Indicate Speaker's Intention or Decision 10-9 Special Headconcepts Used To Indicate Intention, Feeling or Judgment of An Utterance 10-10 Other Special Concepts and Headconcepts 10-17 EDR ************************************************************* ENGLISH CORPUS Table 10-1 Part of Speech Assignments in the English Corpus _______________________________________________________________________________ ____English_Corpus_____________Correspondence_With_English_Word_Dictionary_____ Part_of_Speech____Name___________Part_of_Speech_________Code________Example____ Noun NOUN Common Noun EN1 book Proper Noun EN2 Tokyo Cardinal Number EN3 one, two Ordinal Number EN4 first Noun Classifier EN5 piece of Indefinite Pronoun EP4 some Pronoun PRON Personal Pronoun EP1 I, my, me, mine Demonstrative Pronoun EP3 this, that Demonstrative DEMO Demonstrative Pronoun EP3 this, that Demonstrative Determiner ET1 this, that INDEF Indefinite Pronoun EP4 some Indefinite Determiner ET2 any Question Words WH Interrogative Pronoun EP2 who, what Relative Pronoun EP5 who, whose, that Relative Adverb ED1 whenever Interrogative Adverb ED2 how Intransitive VI Verb EVE run Verb Transitive Verb VT Verb EVE get Verb VERB Verb EVE run Be-verb BE Be-verb EBE am, are, is Adjective ADJ Adjective EAJ beautiful Indefinite Determiner ET2 any Adverb ADV Adverbial Particle ED3 off, up Common Adverb ED5 very Adverbial Par- PTCL Adverbial Particle ED3 off, up ticle Conjunction CONJ Subordinating Conjunction EC1 whether Coordinating Conjunction EC2 and, but Adverbial Particle ED4 in case Prefix PF Prefix EPF Suffix SUF Noun Suffix EEN Verb Suffix EEV Adjective Suffix EEA Adverb Suffix EED Article ART Article EAR a, an, the Auxiliary Verb AUX Auxiliary Verb EAV will, must Preposition PREP Preposition EPR in, on, at Interjection ITJ Interjection EIT ah, oh Blank Space BLNK Blank Space ESY Punctuation PUNC Symbol ESY , ; . Mark Symbol SYM Symbol ESY A, B, C, a, b, c Unit UNIT Unit EUN cm, kg Numeral NUM Number NUM 1990 10-18 EDR ************************************************************* ENGLISH CORPUS Error in Nota- AP MP tion(*) ____Note:_Code_is_given_when_the_word_notation_of_a_morpheme_is_incorrect._____ 10-19 EDR ************************************************************* ENGLISH CORPUS Table 10-2 Relation Labels for the Representation of Facts or Occurrences Relation Label ________________________________________________________________________________ agent That which acts on its own volition and is the subject that brings about an action Ex. The father eats. [ [main 3:eat:3bc6f0] [agent 1:father:0e7c00]] object That which is affected by an action or change Ex. (He/She) eats an apple. [ [main 3:eat:3bc6f0] [object 1:apple:3bd8db]] a-object That which has a particular attribute Ex. A tomato is red. [ [main 3:red:0e29cb] [a-object 1:tomato:3bc118]] implement That which is used in a voluntary action such as tools or other implements Ex. (I) cut with a knife. [ [main 3:cut:0ecff7] [implement 1:knife:3c4e7d]] material That which is used to make up something Ex. (He/she) makes butter from milk. [ [main 5:make:0fe812] [object 3:butter:3be1c7] [material 1:milk:3c03b7]] source Location from which an event or occurrence begins Ex. (I) come from Kyoto. [ [main 3:come:3d144c] [source 1:Kyoto:0ecb69]] goal Location from which an event or occurrence ends Ex. (I) go to Tokyo. [ [main 3:go:1e84a2] [goal 1:Tokyo:0ffee1]] place Place (physical location) at which something occurs Ex. (I) play in the room. [ [main 3:play:3cf67f] [place 1:room:1080e6]] scene Place (abstract location) at which something occurs Ex. (I) act in a drama. [ [main 3:act:3cf94e] [scene 1:drama:1013ed]] basis That which is used as the standard of comparison Ex. Roses are more beautiful than tulips. [ [main @1::c#more] [object [ [main 5:beautiful:1e84c3] [a-object 1:rose:0f6013]]] [basis [ [main @2:beautiful:1e84c3] [a-object 3:tulip:3c2801]]] manner Way in which an action or change occurrs Ex. (I) speak slowly. 10-20 EDR ************************************************************* ENGLISH CORPUS [ [main 2:speak:3ce6b9] [manner 1:slowly:0f81ac]] Ex. (I) watch for 3 hours. [ [main 3:watch:1e8643] [manner [ [main 2:hour:0f6fe4] [number 1:£³:"¡á£Î £³"]]] ] time Time at which something occurrs Ex. (I) wake up at 8 o'clock. [ [main 4:wake up:3cfbdf] [time [ [main 2:o'clock:0f6f06] [modifier 1:£¸:"¡á£Î £¸"]]]] time-from Time at which something begins Ex. (I) work from 9 o'clock. [ [main 4:work:0e2799] [time-from [ [main 2:o'clock:0f6f06] [modifier 1:£¹:"¡á£Î £¹"]]]] time-to Time at which something ends Ex. (I) work until 9 o'clock. [ [main 4:work:0e2799] [time-to [ [main 2:o'clock:0f6f06] [modifier 1:£¹:"¡á£Î £¹"]]]] quantity Amount (quantity) of a thing, action, or change Ex. (There are) 3 kgs of apples. [ [main 4:apple:3bd8db] [quantity [ [main 2:kg:3c0285] [number 1:£³:"¡á£Î £³"]]]] Ex. (I) lost 3 kgs. [ [main 3:lose:3c049e] [quantity [ [main 2:kg:3c0285] [number 1:£³:"¡á£Î £³"]]]] modifier Modification Ex. the book on the desk [ [main 5:book:0e5097] [modifier [ [main 3:on:0e5797] [modifier 1:desk:3d05cf]]]] number Number Ex. 3 kgs [ [main 2:kg:3c0285] [number 1:3:"¡á£Î 3"]] and Coordination between concepts Ex. (I) go to Rome and Naples. [ [main 5:go:1e84a2] [goal [ [main 3:Naples:1efc5a] [and 1:Rome:10e979] [attribute focus]]]] Ex. The mountains are beautiful and the water is clear. [ [main [ [main 8:clear:0f8f10] [a-object 6:water:3bd634]]] [and [ [main 3:beautiful:1e84c3] [a-object 1:mountain:3ce994]]]] or Selection between concepts 10-21 EDR ************************************************************* ENGLISH CORPUS Ex. (I will) go to Rome or Naples. [ [main 5:go:1e84a2] [goal [ [main 3:Naples:1efc5a] [or 1:Rome:10e979] [attribute focus]]]] Ex. (I will) go to school or go to the library. [ [main [ [main 8:go:0f8f10] [goal 6:library:100648]]] [or [ [main 3:go:0f8f10] [goal 1:school:3cf8b1]]]] condition Condition of an occurrence or fact Ex. (I) It rained so (I) went home. [ [main [ [main 9:went:0e8e45] [goal 7:home:0e5cdb]]] [condition 1:rain:3bba1f]] purpose Purpose or reason for an action or occurrence Ex. (I) go to see a movie. [ [main 5:go:1e84a2] [purpose [ [main 3:see:1e8646] [object 1:movie:3be65c]]]] cooccurrence Simultaneous occurrence of events or actions Ex. (I) cried while I was going home. [ [main 7:cry:0f4cf1] [cooccurrence [ [main 3:go:0e8e45] [goal 1:home:0e5cdb]]]] sequence Sequential occurrence of events or actions Ex. (I) went to the library and borrowed a book. [ [main 8:borrow:0e97a9] [object 6:book:0e5097] [sequence [ [main 3:go:0f8f10] [goal 1:library:100648]]]] ________________________________________________________________________________ Pseudorelation Labels ________________________________________________________________________________ possessor Possession or ownership Ex. (my) father's book [ [main 3:book:0e5097] [possessor 1:father:0e7c00]] beneficiary Beneficiary (receiver) of an action or occurrence Ex. (I) buy a book for my father. [ [main 3:buy:1e84f1] [beneficiary 1:father:0e7c00]] unit Unit Ex. (This costs) 500 yen per dozen. [ [main 5:yen:0e6912] [number 4:£µ£°£°:"¡á£Î 500"] [unit [ [main 2:dozen:3bf083] 10-22 EDR ************************************************************* ENGLISH CORPUS [number 1:£±:"¡á£Î 1"]]]] from-to Range of items specified Ex. the cities from Osaka to Tokyo [ [main 6:cities:3cfc38] [modifier [ [main 3:Tokyo:0ffee3] [from-to 1:Osaka:0e7107]]]] ________________________________________________________________________________ Table 10-3 Labels Used to Represent Attributes of Facts or Occurrences ________________________________________________________________________________ not Negation of state or condition Ex. don't work [ [main 4:work:0e2799] [attribute not]] generic General (non-specific) Ex. like apples [ [main 1:like:3cee21] [a-object [ [main 4:apples:3bd8db] [attribute generic]]]] all Quantification (totality) Ex. all apples [ [main 3:apples:3bd8db] [attribute all]] some Quantification (partial) Ex. some apples [ [main 3:apples:3bd8db] [attribute some]] each Quantification (numeration) Ex. each apple [ [main 3:apple:3bd8db] [attribute each]] this Demonstration (near) Ex. this apple [ [main 3:apple:3bd8db] [attribute this]] that Demonstration (far) Ex. that apple [ [main 3:apple:3bd8db] [attribute that]] specific Specific instance ________________________________________________________________________________ 10-23 EDR ************************************************************* ENGLISH CORPUS Table 10-4 Labels Used to Indicate References Made to Time _________________________________________________________________________ Time_Attribute_Label_____________________________________________________ past Time made reference to is the past present Time made reference to is the present future_________________Time_made_reference_to_is_the_future______________ Table 10-5 Label Used to Indicate Aspect _________________________________________________________________________ Aspect_Labels_#1_________________________________________________________ begin Indicates an event or action starts end Indicates an event or action has ended progress Indicates an event or action is not yet completed continue Indicates a repetitious action/movement continues state Indicates the result (resulting state) of an ac- _______________________tion_is_continuous________________________________ _________________________________________________________________________ Aspect_Labels_#2_________________________________________________________ yet Indicates a pending state (the event has neither started nor ended) already Indicates a state/event has either started or end- ed soon Indicates a state/event will either start of end soon just___________________Indicates_a_state/event_has_just_started_or_ended_ complete_______________Indicates_the_completion_of_an_event______________ come Indicates an event approaches the speaker's point of reference go Indicates an event gets further away from the _______________________speaker's_point_of_reference______________________ 10-24 EDR ************************************************************* ENGLISH CORPUS Table 10-6 Aspect in English ________________________________________________________________________________ Pre-Inchoative Aspect be about to, be going to ~ soon Ex. Is about to sweep [ [main 7:sweep:3ce70c] [attribute begin yet soon]] Inchoative Aspect begin to Ex. Begins to sweep [ [main 6:sweep:3ce70c] [attribute begin]] Ingressive Aspect just began ~ing Ex. Just began sweeping [ [main 5:sweep:3ce70c] [attribute begin just already]] Durative Aspect (Continuous Aspect) be ~ing (present participle) Ex. Is sweeping [ [main 3:sweep:3ce70c] [attribute continue]] Pre-Terminative Aspect be going to finish ~ing (present participle) Ex. Is going to finish sweeping [ [main 10:sweep:3ce70c] [attribute soon end yet]] Terminative Aspect (Egressive Aspect) have V (past participle) Ex. Has swept [ [main 3:sweep:3ce70c] [attribute already end]] Effective Aspect have already finished ~ing (present participle) Ex. Have already finished sweeping [ [main 8:sweep:3ce70c] [attribute complete]] Post-Terminative Aspect have just V (past participle) Ex. Have just swept [ [main 5:sweep:3ce70c] [attribute just end already]] Resultative Aspect have V (past participle) Ex. Have swept [ [main 3:sweep:3ce70c] [attribute state]] ________________________________________________________________________________ 10-25 EDR ************************************************************* ENGLISH CORPUS Table 10-7 Constituent Attribute Labels Used to Indicate Speaker's Intention ____________________________________________________________________________ emphasis Emphasis focus Focus topic Theme or subject wh________________________Indecision________________________________________ 10-26 EDR ************************************************************* ENGLISH CORPUS Table 10-8 Sentence Attribute Labels Used to Indicate Speaker's Intention or Decision _________________________________________________________________________ Labels_Indicating_Speaker's_Request_or_Consent___________________________ imperative Commands (expressed by the imperative form of the verb) grant Permission (may, can) consent Consent (may, can) grant-not Prohibition advise Advisory (expressed by 'would be better') recommend Recommendation (would be better) invite Invitation (let us + verb) require- Confirmation (~, isn't it?) agreement________________________________________________________________ Labels_Indicating_Sentence_Style_or_Speaker's_Manner_of_Speaking_________ polite Polite request (Would you ~) respect___________Respect_(~,_sir)_______________________________________ Labels Indicating Speaker's Decision Towards Facts Expressed in_Concept_Relation_Representation_______________________________________ should Obligation (should, ought to, had better) sufficiency Sufficiency (have only to) duty Duty or obligation (must, have to, be to) interrogation Question (~?) conclude Conclusion (I am sure) sure Conclusion drawn from facts (actually, really, no doubt, must) maybe Possibility (may, maybe, probably, might, ought, ap- pear) seem Conjecture or supposition (seem, look, sound, appear) rumor Hearsay (I hear that) appearance Condition and comparison (look, seem) be-sorry Regret natural-result Natural or expected conclusion natural-thing Ideal shape or form if Supposition of something unknown (~, isn't it?) thought Supposition of the opposite of fact/reality (subjunc- tive form) reality___________Fact_or_taken_as_fact_(because,_because_of)____________ Labels_Indicating_Speaker's_Intention_of_Speaker's_Utterance_____________ exclamation Admiration/disbelief (~!) pity Regret (it's a pity that, Unfortunately, blame Criticism unexpected Not anticipated (have + verb (past participle), unex- pectedly) underestimate_____Underestimation_(be_nothing_but)_______________________ 10-27 EDR ************************************************************* ENGLISH CORPUS Table 10-9 Special Headconcepts Used to Indicate Intention, Feeling or Judgment of an Utterance _________________________________________________________________________ Labels_of_Decision_______________________________________________________ c#ability Ability (Based on innate capability)(can, be able to) c#difficulty Difficulty (be difficult to) c#easiness Easiness (be easy to) c#excess Excessiveness (over~, too) c#need Necessity (need, necessary, need to) c#only Distinctiveness c#possibility Possibility (Based on situation or circumstance) (can, be possible) c#shortage Insufficiency (unsatisfactorily) c#tendency Tendency (High frequency of occurrence) (tend to, _______________________be_apt_to)________________________________________ Labels_of_Subjective_Feelings_or_Intention_______________________________ c#unwill-duty Recognition (Situation in which one cannot avoid doing something one finds disagreeable) (cannot help ~ing) c#voluntary Volitional (Having an almost uncontrollably strong desire to realize an action) (cannot stop ~ing) c#want Desire (Having a desire for the realization of an action) (want to-infinitive) c#will Inclination or intention (Having the intention to _______________________realize_an_action)_(will,_would,_be_going_to)_____ Labels_of_Past_Actions_or_Events_________________________________________ c#experience Experience (Event occurred at a point in time be- fore the utterance) (present perfect) c#failure Failure (Attempt at something was not realized) (fail to) c#missing-a-chance Lost opportunity (Something one intends to do remains unrealized due to lack of opportunity) _______________________(miss_a_chance_to)________________________________ Labels_of_Future_Actions_or_Events_______________________________________ c#effort Effort (One will put for effort to bring about the realization of an action) Note: realization is unclear (make efforts to) c#schedule Schedule (One has a plan to realize an action at a point in time after the utterance) (be to, be go- ing to, shall/will be ~ing) c#try Attempt (One will attempt an action) Note: eventual realization is unclear _______________________(try_to)__________________________________________ 10-28 EDR ************************************************************* ENGLISH CORPUS Labels_of_Order/Command_and_Request______________________________________ c#causative Causation (make) c#give-benefit Benefit (Actions of speaker benefits other) c#receive-benefit Benefit (Actions of others benefits speaker; Ac- tion of others directed by request of speaker) c#request______________Polite'_Command_(would,_would_like,_I_want_you_to) 10-29 EDR ************************************************************* ENGLISH CORPUS Table 10-10 Other Special Concepts and Headconcepts ________________________________________________________________________________ Labels_Corresponding_to_Pronouns,_etc.__________________________________________ Substitute_Concept_Label______________________________________________Referent__ c#I First person singular c#we First person plural c#you-s Second person singular c#you-p Second person plural Person c#you Second person (number not specified) c#she Third person (gender female) singular c#he Third person (male gender) plural c#they___________Third_person_plural____________________________________________ c#it_____________Thing/matter_________________________________________Thing_____ Labels_for_Specifying/Restricting_Concepts______________________________________ c#all all c#some some c#each___________each___________________________________________________________ Special_Labels__________________________________________________________________ c#statement Special headconcept notation indicating a sentence and provides information about the structure of a piece of writing c#nil Special notation used to represent a blank head- _________________concept________________________________________________________ Labels_Used_For_Representing_Comparison_________________________________________ c#equal Equality c#just-as Similarity c#least Least c#less Less (does not include cases in which numerical value is the same; indicates insufficiency) c#more More (does not include cases in which numerical value is the same) c#most___________Most___________________________________________________________ Labels_Used_For_Representing_Numerical_Values___________________________________ c#no-more No more (includes cases in which the value is the same) c#no-less No less (includes cases in which the value is the same) c#about__________Approximation__________________________________________________ 10-30 EDR ************************************************************* ENGLISH CORPUS Labels_Given_To_Classifiers_____________________________________________________ c#frequency Frequency c#ordinal Order c#tuple__________Group__________________________________________________________ 10-31