EDR ************************************************************ JAPANESE CORPUS Chapter_9 The_Japanese_Corpus The Japanese Corpus is composed of records arranged according to EUC (Extended Unix Code). The records of the Japanese Corpus are composed of the record number, sentence information, constituent information, morphological information, syntactic information, semantic information and management information. The basic role of the Japanese Corpus is first to identify the sentence constituents of sentences, and then to indicate how the constituents combine to form the morphological, syntactic and semantic structure of the sentence using a large number of actual examples as the source data. =====================Structure of Japanese Corpus Records======================= :Record type and identifier number :(Section 9.1) :Management number for sentence :Name of source from which text is taken :Notation of example sentence :(Section 9.2) :*1 :(Section 9.3) :*2 :(Section 9.4) :*3 :(Section 9.5, and Tables 9-2 - 9-10) :*4 :Information for dictionary development :Management information such as date of creation or record update *1 ::= 。。。 ::= ::= | | ------------------------------------------ :Morpheme number or compound number (Section 9.2.1) :Notation of morpheme or compound word (Section 9.2.2) :Reading in Katakana notation (Section 9.2.3) :(Section 9.2.5) :(Section 9.2.4 and Table 9-1) :Number that uniquely identifies the concept :Explanation of concept that is given when an appropriate concept is not 9-1 EDR ************************************************************ JAPANESE CORPUS available among the concepts of the words from the Japanese Word Dictionary :Number of the compound word *2 Sequence of Morphemes ::= 。。。/ ::=/: ::='{'。。。/'}' ::= ::=/{ | }。。。/ ::=*/ *3 Syntactic Tree ::= | ::='('')' ::='('。。。')' | '(' 。。。')' | '('Formation Label>。。。')' ::=
| | ----------------------- :Terminal Node :Non-Terminal Node :Index that indicates node is a leaf :Indicates one of four possible formation types (Section 9.4.2)
:Main (most central) Sub-concept node *4 Semantic Frame ::=[。。。] ::=[] | [ Relation Slot Name>。。。] | [。。。] ::= | main | which | attribute | S-attribute ::= : : | ::= | ::= | | ----------------------- :Concept relation label for expressing facts (reality) and events (phenomena) :Attribute label assigned to each constituent 9-2 EDR ************************************************************ JAPANESE CORPUS :Attribute label assigned to sentence :Number used when concept is added ================================================================================ ========================Example of Japanese Corpus Records====================== JCO 000500017459 朝日新聞870301 会場は熱気に包まれ、集会後、周辺路上での デモ行進に移った。 1 会場 カイジョウ 名詞 3c0841 2 は ハ 助詞 2621d5 3 熱気 ネッキ 名詞 102ab4 4 に ニ 助詞 2621d5 5 包 ツツ 動詞 3ce654 6 ま マ 語尾 2621cb 7 れ レ 助動詞 2621c1 8 、 、 記号 2621d7 9 集会 シュウカイ 名詞 3cec82 10 後 ゴ 名詞 3d0476 11 、 、 記号 2621d7 12 周辺 シュウヘン 名詞 3cf780 13 路上 ロジョウ 名詞 10ebf5 14 で デ 助詞 2621d5 15 の ノ 助詞 2621d5 16 デモ デモ 名詞 I#1 17 行進 コウシン 名詞 I#1 18 に ニ 助詞 2621d5 19 移 ウツ 動詞 0e5eca 20 っ ッ 語尾 2621ce 21 た タ 助動詞 2621c6 22 。 。 記号 2621d8 I#1 デモ行進 デモコウシン 名詞 =Z 何かに反対して行う路上での 行進 01234567890123456789012345678901234567890123456789012345678901234567890123456789 /1:会場/2:は/3:熱気/4:に/5:包/6:ま/7:れ/8:、/9:集会/10:後/11:、/12:周辺/13:路上/ 14:で/15:の/16:デモ/17:行進/18:に/19:移/20:っ/21:た/22:。/{I#1//16:デモ/17:行進/ /} (S (t (M (S (t (M (S (t (W 1 "会場")) (W 2 "は")) (t (M (S (t (W 3 "熱気")) (W 4 "に")) (t (S (t (W 5 "包")) (W 6 "ま") (W 7 "れ"))))))) 9-3 EDR ************************************************************ JAPANESE CORPUS (W 8 "、")) (t (M (S (t (M (W 9 "集会") (t(W 10 "後")))) (W 11 "、")) (t (M (S (t (M (S (t (M (W 12 "周辺") (t(W 13 "路上")))) (W 14 "で") (W 15 "の")) (t (W 16 "デモ行進")) (t (I 1 "デモ行進" (W 16 "デモ") (W 17 "行進")))) (W 18 "に")) (t (S (t (W 19 "移")) (W 20 "っ") (W 21 "た"))))))))) (W 21 "。")) [ [main 18:移:0e5eca] [time [ [main 10:後:3d0476] [modifier 9:集会:3cec82]]] [goal [ [main I#1:デモ行進:"=Z 何かに反対して行う路上での行進"] [place [ [main 13:路上:10ebf5] [modifier 12:周辺:3cf780]]]]] [and [ [main 5:包:3ce654] [object 1:会場:3c0841] [implement 3:熱気:102ab4]]]] ================================================================================ 9.1. Sentence Information The sentence information of the Japanese Corpus record includes information about the sentence itself. It is composed of the sentence, text number, and source information. The text number is a sentence management number that is given to all of the sentences registered in the EDR Text Base. The sentences stored in the Japanese Corpus have been extracted from source texts contained in the EDR Text Base. The text number is provided in order to be able to extract the context (source) from the EDR Text Base. The source information contains the name of the text source. The sentence is the notation of the example sentence itself. 9.2. Constituent Information The constituent information indicates what constituents the sentence is composed of and gives the information relevant to the morphemes or compound words of the sentence. A morpheme is defined as the smallest linguistic unit that combines with other morphemes to make up a sentence. In general terms, morphemes refer to words, prefixes and suffixes. Idiomatic or set phrases fall 9-4 EDR ************************************************************ JAPANESE CORPUS into the category of compound words. Included in Constituent Information are: constituent number, notation, stem form, part of speech and selected concept. 9.2.1. Constituent Number The constituent number is the number of the morpheme or compound word. The number given to the morpheme or compound word corresponds to the order in which it appears in the sentence. That is, the first constituent of the sentence is given constituent number '1', the second constituent of the sentence is given constituent number '2', etc. The words that make up a compound word are treated as single constituents and given a constituent number. However, at the end of the constituent information block, the compound words are regrouped and given a new constituent number preceded by an "I". The number assigned to the compound word also corresponds to the order in which it appears in the sentence. If there are two compound words, the compound word is given the constituent number I#1, and the compound word occurring next in the sentence is given the constituent number I#2. A compound word is a word composed of two or more words that together express a single concept. The concepts of each of the words that comprises the compound word in isolation cannot express the concept represented by the compound word. In addition, the constituents that make up the compound word do not show independent syntactic functions. Also considered compound words, are idiomatic phrases and words that when affixed point to a concept different from the concept when the word is not affixed. The following types of words are not considered compound words: a phrase consisting of a word and attributive modifier(s) whose concept does not change by the attribute concept; a combination of words with different notations but whose concepts are the same; combinations of words with different concepts such as 'organization name + position name' or 'place name + place name', in which the constituents may be replaced with other words, thus making the number of combination too numerous to register in the dictionary. The morphemes and compound words in the Japanese Corpus correspond to the word entries in the Japanese Word Dictionary. All idiomatic or set phrases and compound words in the Japanese Word Dictionary are further divided at the constituent level as morphemes. Example of Compound Words (Including Idiomatic Phrases) ___________________________________________________________ Type_______________Sequence________________________________ Compound word 平均|値 成田|空港 高山|植物 取|り|出|す 引|き|上|げ|幅 音楽|教育 密室|型|政治 Idiomatic Phrase 油|を|売|る 9-5 EDR ************************************************************ JAPANESE CORPUS ___________________光彩|を|放|つ________________________ Note: the division between morphemes is indicated by a '|' symbol. 9.2.2. Notation The notation is composed of the character string of the morpheme or compound word. 9.2.3. Kana Notation The Kana notation gives the reading of the notation. The Kana notation is given in Katakana. However, Roman alphabet letters, numbers and symbols are also used. The reading of Kanji characters and Kana are given in Katakana. The reading of English words is either the reading of the letters that make up of the word, or the English pronunciation of the word. As a general rule, all numbers including numbers wirtten in Kanji characters are given in Arabic numerals. The details of the notation are given below. Example of Kana Notation __________________________________________________________________________________ ____Type_________Notation_____Kana_Notation___________Comment_(Explanation)_______ Kanji 日本 ニホン Katakana Numbers (1) 300 300 Corresponding numeral Numbers (2) 300万 3000000 Corresponding numeral Alphabet CNN CNN Reading of alphabet letter Letters (1) Alphabet COBOL コボル Reading of word Letters (2) Alphabet ppm PPM Small letters are capitalized Letters (3) Hiragana もろさ モロサ Corresponding Katakana characters Katakana イタリー イタリー Same as Katakana notation Symbol (1) WD−61 WD610 Symbols that can be abbreviated are 0 abbreviated Symbol (2) H・G・ HGウェルズ 「・」between surnames and names is ウェルズ omitted Chinese Name 趙紫陽 チョウシヨウ Japanese Kanji reading (Person) Chinese Name 西安 セイアン Japanese Kanji reading (Place) Korean Name 金泳三 キムヨンサム Korean reading (Person) Korean Name 釜山 プサン Korean reading (Place)___________________________________________________________________________ 9-6 EDR ************************************************************ JAPANESE CORPUS For the word form of the morphemes of inflecting words, see the explanation in the Japanese Word Dictionary. Specifically, see Table 2-5 'Japanese Verb Inflection', Table 2-6 'Japanese Adjective Inflection' and Table 2-7 'Japanese Adjectival Noun Inflection'. Note that in the word stem for 'Ichidan' conjugating verbs is different than that which is used in most grammar explanations. 9.2.4. Part of Speech A part of speech assignment is given to all morphemes and compound words. The fifteen parts of speech used in the Japanese Corpus are: noun verb adjective adjectival noun adverb adnoun conjunction prefix suffix suffix ending particle/postposition auxiliary verb interjection symbol numeral The names of the parts of speech are shown in Table 9-1. The parts of speech used in the Japanese Corpus are not as detailed as the parts of speech used in the Japanese Word Dictionary. 9.2.5. Selected Concept The selected concept is provided in order to show what meaning has been used or what kind of concept is being indicated by the morpheme or compound word. The selected concept is either a concept identifier, a supplemental explanation of the concept or a compound word number. The concept identifier is given when the morpheme corresponds to a regular concept. Of the concepts that correspond to the morpheme or to the compound word, the one concept that most appropriately corresponds to the meaning as it has been used in the original text is the selected concept. The supplemental explanation of the concept is given in cases where an appropriate concept among the words in the Japanese Word Dictionary corresponding to the morpheme or compound word is not available. There are two different formats in which the supplemental explanations of the concept are given: Z format and W format. The character string notation of the explanation comprises the Z format while the W format is composed of a list of words that are synonymous and/or similar to the meaning of the specified concept. The 9-7 EDR ************************************************************ JAPANESE CORPUS supplemental explanation of the concept given in a record may be either in W format, Z format or a combination of the two types. When both Z format and W format are used in the same record a slash (/) is used to indicate the division between the two. The Z format supplements the insufficient concept by explaining or describing the concept. In the Z format, the relation between the original word and the specified concept is first described. This is then followed by the concept explication. The headconcept in these cases is precede by a special headconcept symbol, 'c#'. Notation Indicating Relation Between Word and Specified Concept of the Supplemental Explanation of the Concept (Z Format) ___________________________________________________________________________ Symbol Relation Between Original Word and _____________Specified_Concept_____________________________________________ '=' Synonymous Specified concept is the same in meaning as the concept of the word used in the sentence. '<' Narrower in Meaning Specified concept is included in the meaning of the concept of the word used in the sentence. In such cases, the specified concept adds a more detailed meaning to the concept of the word. (ポ メラニアン < 犬) '%' Similar in Meaning Specified concept is not the same as the concept of the word in the sentence but closely resembles it. (たぬき % むじな) '>' Broader in Meaning Meaning of concept of the word used in the sentence is included in the meaning of the specified concept. In such cases, the meaning of the word adds a more detailed meaning to the specified ___________________________________concept.__(This_is_not_usually_used.)___ Example of Supplemental Explanation of Concept (Z Format) __________________________________________________________________ 1 せっ セッ 動詞 =Z 円と直線が交差せずに一点で接触する 2___する___スル___語尾__________2621d0____________________________ The W format supplements the insufficient concept by giving one or more words (group of words) or compound words. Included in the W format, is the notation (In Kana) of the word (or compound word) in the predicative form and the part of speech. Note however, that the part of speech assignment may be omitted when the part of speech of the original word and that of the compound word(s) are the same. When the part of speech assignment is given it is placed to the right of the compound word(s), etc. English words may also be used in 9-8 EDR ************************************************************ JAPANESE CORPUS the W format. Example of Supplemental Explanation of Concept (W Format) ________________________________________________________________________ 1 せっ セッ 動詞 =W 応対する(オウタイスル)<動詞>/応接す る(オウセツスル)<動詞> 2___する___スル___語尾__________2621d0__________________________________ When the morpheme is included in the compound word, the compound word number is given. The compound word number is the number given to the compound word as it occurs in the sentence and is expressed in the format 'I#'. 9.3. Morphological Information The morphological information shows the segmentation of the sentence into morphemes. The constituents of the morpheme sequence are shown first. The compound word sequence(s) is provided after the morpheme sequence. 9.4. Syntactic Information The syntactic information shows the syntactic structure of the sentence. It shows in what way the syntactic constituents combine to comprise the sentence. The syntactic tree used to show the syntactic information is a parsing tree based on dependency structure and is given in list form. 9.4.1. Structure of Syntactic Tree The nodes that comprise the syntactic tree are leaves and intermediate nodes. The main node is assigned to one of intermediate nodes or leaves that constitute the same intermediate node. A leaf is the terminal node that corresponds to each of the morphemes. The leaf is described by a leaf identifier (W). The leaf identifier 'W' precedes the morpheme. An intermediate node is a non terminal node that groups leaves with syntactic relations. The intermediate nodes describe the main node and the various formation types. In the description of compound words, the entirety of a compound word is first described in the leaf format as a single leaf. Then, as the intermediate nodes, the internal structure of the leaf is described. The main node is a node among the group of sub-nodes that is regarded central. 9-9 EDR ************************************************************ JAPANESE CORPUS 9.4.2. Formation Types The term 'formation' as is used in this manual refers to the way in which sub-nodes join or are grouped together. There are four different types of formation, each indicated by a different formation label The type of formation and corresponding formation label are as follows: modification ('M'), synthesis ('S'), number formation ('N'), and compound word formation ('I'). The label indicating the type of formation is given at the head of each of the intermediate nodes. The first type of formation, modification, is composed of the modifying constituent and the constituent that is modified. Included in modification are constituents that show a dependency relation, and also modification by adjectives and adverbs. The main node of this type of formation is the modified constituent. Examples of Formation By Modification -------------------------------------------------- Type Example -------------------------------------------------- Adnominal Modification (M (W 1 "主要") (t (W 2 "財界人"))) Verbal Modification (M (W 1 "すっきり") (t (W 2 "する"))) -------------------------------------------------- The synthesis formation is created when several sentence constituents are grouped to form a single sentence constituent. Included in this type of formation are constituents grouped through subordination and those grouped through coordination. Subordination is a formation comprised of an autonomous word that has a concept and the subordinating word(s) which does not have a concept. The main node of the formation is the autonomous word. Examples of Synthesis (Subordination) -------------------------------------------------- Type Example -------------------------------------------------- Stem of Word and Suffix Ending (S (t (W 1 "咲")) (W 2 "く")) Noun and Particle (S (t (W 1 "庭")) (W 2 "の")) Prefix (S (W 1 "お") (t (W 2 "話"))) Suffix 9-10 EDR ************************************************************ JAPANESE CORPUS (S (t (W 1 "田中")) (W 2 "さん")) Punctuation Mark (S (t (W 1 "しかし")) (W 2 "、")) -------------------------------------------------- Coordination is a formation type used to group a noun (noun phrase) that has a concept and another noun (noun phrase) that also has a concept by a coordinating conjunction. The grouped elements form a single noun phrase. The main node of the formation is the word that occurs closest to the end of the sentence/phrase. Examples of Synthesis (Coordination) -------------------------------------------------- (S (W 1 "A") (W 2 "と") (t (W 3 "B"))) -------------------------------------------------- The compound word formation is the type of formation used to make compound words. This includes idiomatic phrases and set phrases. In the compound word formation, the entirety of the compound word is described in a leaf structure, and then the internal structure of the compound word is described. Example of Compound Word Formations -------------------------------------------------- (W 16 "デモ行進") (I 1 "デモ行進" (W 16 "デモ") (W 17 "行進")) -------------------------------------------------- The number formation is the type of formation used to make a single formation of several numbers. The number formation is formed in the same way as the synthesis formation. The value is preceded by the formation label N. Example of Number Formations -------------------------------------------------- (N -3 (W 1 "−") (W 2 "3")) -------------------------------------------------- 9.4.3. How Intermediate Nodes Are Formed When there are multiple modifying constituents, in principle the syntactic tree is described so that the nodes do no cross each other. This is done by describing the innermost node first, and then the next innermost node, etc. until all the nodes are described. In the first example below, the first formation described is '(卓球を)(する)' and the second formation described is '(太郎が)(卓球をする)'. In the second example below, '(太郎が)(する)' is first described. The second formation described is '(卓球を)(太郎がする)'. 9-11 EDR ************************************************************ JAPANESE CORPUS Example of Formation of Intermediate Node -------------------------------------------------- Character String Notation Syntactic Tree -------------------------------------------------- "太郎が卓球をする" (M (S (t (W 1 "太郎")) (W 2 "が")) (t (M (S (t (W 3 "卓球")) (W 4 "を")) (t (W 5 "する"))))) "卓球を太郎がする" (M (S (t (W 1 "卓球")) (W 2 "を")) (t (M (S (t (W 3 "太郎")) (W 4 "が")) (t (W 5 "する"))))) -------------------------------------------------- In cases where several consecutive nodes should be unified, and it is determined that there is no significant difference from which node the formation is made, three or more nodes are either unified at the same time, or are first made into subordinating formations and then coordinating formations. Example of Three of More Nodes Unified At the Same Time -------------------------------------------------- Type of Formation Syntactic Tree -------------------------------------------------- Example: "行きました" Unified At Same Time (S (t (W 1 "行")) (W 2 "き") (W 3 "ま") (W 4 "し") (W 5 "た")) Not Unified At Same Time (S (t (S (t (S (t (W 1 "行")) (W 2 "き"))) (S (t (W 3 "ま")) (W 4 "し")))) (W 5 "た")) -------------------------------------------------- The syntactic tree for auxiliary verbs that take case particles that are treated as main verbs there is a possibility of crossing nodes. In order to avoid such intersections, the syntactic tree is described by changing the order of the leaves. Since information regarding the position of each of the constituents is contained in the S Format, it is possible to restore the word order of the original sentence. The leaf order of the changed leaves is determined by the first leaf that is described from the main node with the 9-12 EDR ************************************************************ JAPANESE CORPUS smallest morpheme number. The phrase '本を彼は書かせた' becomes '本を書か彼はせ た' in order to avoid crossing. Example of Change in Leaf Order -------------------------------------------------- Example: 本を彼は書かせた (M (M (S (t (W 1 "本")) (W 2 "を")) (t (S (t (W 5 "書")) (W 6 "か")))) (t (M (S (t (W 3 "彼")) (W 4 "は")) (S (t (W 7 "せ")) (W 8 "た"))))) -------------------------------------------------- 9.5. Semantic Information The semantic information gives information regarding the semantic structure of the sentence. It shows how the concepts of the words used in the sentence are joined to form the overall content structure of the sentence. This is provided by way of a 'concept relation representation'. The concept relation representation is given in a frame format. 9.5.1. Structure of Semantic Information The semantic information is a semantic frame in which the relations between the concept of the predicate and the other concepts in the sentence are listed. The first slot of the frame is the concept that makes up the predicate of the sentence. The first slot indicates the main concept and is given the slot name 'main'. The remaining slots of the semantic frame are composed of the other concepts. Each slot shows the content of the concepts and gives the semantic relation it has to the main concept. The slots of the frame following the main slot indicate what attribute (modifying or determining) the concept of the slot has in relation to the main concept. Example of Semantic Frame (Model) -------------------------------------------------- Example Sentence: 彼が字を書いたらしい [ [main 5:書:0e910d] [agent 1:彼:2dc304] [object 3:字:3d0797] [attribute already end] [S-attribute seem]] -------------------------------------------------- The slot names used in the semantic frame are 'main', 'attribute', 'S- attribute' 'which' and the concept relation labels that make up Table 9-2. The 'attribute', 'S-attribute' and 'which' slots are special slot names. The attribute slot labels (Table 9-3 and 9-7) are applicable to individual 9-13 EDR ************************************************************ JAPANESE CORPUS constituents of the sentence and indicate information such as the viewpoint of the speaker. The S-attribute labels (Table 9-8) are applicable to the entire sentence and indicate information such as the viewpoint of the speaker. The which slot is used to describe an embedded sentence that modifies the predicate. The concept of the element being modified is given the slot name 'main' and the embedded sentence is then described in the which slot. An example of the notation used in the which slot is given below. Example of Semantic Frame (which Slot) -------------------------------------------------- Example: 彼の書いた字 -------------------------------------------------- [ [main 6:字:3d0797] [which [ [main 3:書:0e910d] [agent 1:彼:2dc304] [object 6:字:3d0797] [attribute already end]]]] -------------------------------------------------- 9.5.2. Concept Relation Representation in the Japanese Corpus The concept relation representation describes the semantic structure of a sentence written in natural language. The concept relation representation is composed of the concept that corresponds to the words in the sentence and a concept relation label. The concept relation representations in the Japanese Corpus provide information for the following: 1) facts (reality) and events (phenomena); 2) the viewpoint of the speaker; 3) the intention of the utterance, and the speaker's feeling, or judgment regarding the content of the utterance; 4) the structure of a piece of writing. Facts and phenomena expressed in a sentence are described by a combination of the concept, the concept relation label, and the concept attribute label. The concept relation label is a label that expresses facts and phenomena. The relation labels include 'agent', 'object', 'implement' and other labels that indicate a relation between event concepts and thing concepts. Also included are event concept labels such as 'condition', 'sequence' and pseudo-relation labels such as 'possessor'. The concept attribute label is a label that indicates the authenticity of an event, or the amount of something or other similar fact. The concept attribute label 'not' is used to indicate negation. For the full list of concept attribute labels refer to Table 9-3. Information in sentences indicating the viewpoint of the speaker towards events or facts is described in either the constituent attribute slot (attribute) or the sentence attribute slot (S-attribute). Time indicators such as present, past, future, aspect, and mood, etc., are described in either of the 9-14 EDR ************************************************************ JAPANESE CORPUS aforementioned slots. The difference between the constituent attribute slot and the sentence attribute slot is in the scope of the described attribute. The attribute labels of the constituent attribute slot indicate the attributes of each of the individual constituent concepts that comprise the sentence. This includes tense, aspect and emphasis. The attributes of the sentence attribute slot indicate attributes regarding the sentence as a whole. This includes sentence type such as command, question, and conjecture. Tense is described in the constituent attribute slot with the following attribute labels (Table 9-4): past, present, future. Aspect which is information that indicates the viewpoint of the speaker regarding the progress and condition of an event or fact from a particular point in time is described in the constituent attributes slot. Tables 9-5 and 9-6 show the aspect attribute labels. If information regarding the speaker's thoughts or feeling towards facts or events described in the sentence as well as the intention of sentence are indicated by auxiliary verbs, or by re-wording and style of sentence, etc. an attribute label indicating the speaker's state and attitude when the sentence was produced is given. (Tables 9-7 and 9-8) For sentences in which a subjective description of the speaker's intention and judgment are described, a special headconcept code is given. (Table 9-9). 9.5.3. Addition of Concepts When the concept of a sentence is not represented by the words of the sentence and the lack of the concept prevents the formation of the concept relation representation, the omitted concept must be added. Concepts are added by selecting the headconcept thought to be the most appropriate from Table 9-10. If an appropriate headconcept does not exist the code 'c#nil' meaning 'a concept of some kind' is used in the description. When a concept is added, an added concept number is assigned and that number is made the constituent number. The added concept number is indicated in the following format: '@'. 9.6. Management Information The management information of the Japanese Corpus contains the management history record. The management history record provides information such as date of creation or record update. 9.a Tables 9-1 Part of Speech Assignments 9-2 Relation Labels For The Representation of Facts or Occurrences 9-3 Labels Used To Represent Attributes of Facts or Occurrences 9-15 EDR ************************************************************ JAPANESE CORPUS 9-4 Labels Used To Indicate Time Reference 9-5 Labels Used To Indicate Aspect 9-6 Aspect in Japanese 9-7 Constituent Labels Used To Indicate Speaker's Intention 9-8 Sentence Attribute Labels Used To Indicate Speaker's Intention or Decision 9-9 Special Headconcepts Used To Indicate Intention, Feeling or Judgment of An Utterance 9-10 Other Special Concepts and Headconcepts 9-16 EDR ************************************************************ JAPANESE CORPUS Table 9-1 Part of Speech Assignments in The Japanese Corpus _______________________________________________________________________________ Part of Speech Name Correspondence With Code Example ___in_Japanese_Corpus_______Japanese_Word_Dictionary___________________________ Nouns Common Noun JN1 太陽, 山 Proper Noun JN2 日本 Number JN3 0, 1 Temporal Noun JN4 今日 Formal Noun JN7 こと, もの Verbs Verb JVE 走る Adjectives Adjective JAJ 美しい Adjectival Nouns Adjectival Noun JAM 静か Adverbs Adverb JD1 すっかり Epistemic Adverb JD2 もし Adnouns Adnoun JNM 大きな Conjunctions Sentence Conjunction JC1 それで Word Conjunction JC3 また Prefixes Adjectival Prefix JT1 旧 Adverbial Prefix JT2 再 Adnomial Prefix JT3 各 Honorific Prefix JT4 さ, お Counter Prefix JN5 第, 約 Suffixes Suffix JB1 上, 別 Counter Suffix JN6 回, 章 Unit JUN メートル Conjugational Endings Verb Ending JEV Adjective Ending JEA Adjectival Noun Ending JEM Particles Particle JJO が, を, に Auxiliary Verbs Auxiliary Verb JJD せる, させる Interjection Interjection JIT おい, おや Symbol Symbol JSY 。、. Number Numeral JN3 1990 Error in Notation (*) MP ____Note:_Code_is_given_when_the_word_notation_of_a_morpheme_is_incorrect._____ 9-17 EDR ************************************************************ JAPANESE CORPUS Table 9-2 Relation Labels for the Representation of Facts or Occurrences Relation Label ________________________________________________________________________________ agent That which acts on its own volition and is the subject that brings about an action Ex. 父が食べる [ [main 3:食べ:3bc6f0] [agent 1:父:0e7c00]] object That which is affected by an action or change Ex. りんごを食べる [ [main 3:食べ:3bc6f0] [object 1:りんご:3bd8db]] a-object That which has a particular attribute Ex. トマトが赤い [ [main 3:赤:0e29cb] [a-object 1:トマト:3bc118]] implement That which is used in a voluntary action such as tools or other implements Ex. ナイフで切る [ [main 3:切:0ecff7] [implement 1:ナイフ:3c4e7d]] material That which is used to make up something Ex. 牛乳からバターを作る [ [main 5:作:0fe812] [object 3:バター:3be1c7] [material 1:牛乳:3c03b7]] source Location from which an event or occurrence begins Ex. 京都から来る [ [main 3:来る:3d144c] [source 1:京都:0ecb69]] goal Location from which an event or occurrence ends Ex. 東京に行く [ [main 3:行:1e84a2] [goal 1:東京:0ffee1]] place Place (physical location) at which something occurs Ex. 部屋で遊ぶ [ [main 3:遊:3cf67f] [place 1:部屋:1080e6]] scene Place (abstract location) at which something occurs Ex. ドラマで演じる [ [main 3:演じ:3cf94e] [scene 1:ドラマ:1013ed]] basis That which is used as the standard of comparison Ex. バラはチューリップより美しい [ [main @1::c#more] [object [ [main 5:美し:1e84c3] [a-object 1:バラ:0f6013]]] [basis [ [main @2:美し:1e84c3] [a-object 3:チューリップ:3c2801]]] manner Way in which an action or change occurrs Ex. ゆっくり話す 9-18 EDR ************************************************************ JAPANESE CORPUS [ [main 2:話:3ce6b9] [manner 1:ゆっくり:0f81ac]] Ex. 3時間見る [ [main 3:見:1e8643] [manner [ [main 2:時間:0f6fe4] [number 1:3:"=N 3"]]]] time Time at which something occurrs Ex. 8時に起きる [ [main 4:起き:3cfbdf] [time [ [main 2:時:0f6f06] [modifier 1:8:"=N 8"]]]] time-from Time at which something begins Ex. 9時から働く [ [main 4:働:0e2799] [time-from [ [main 2:時:0f6f06] [modifier 1:9:"=N 9"]]]] time-to Time at which something ends Ex. 9時まで働く [ [main 4:働:0e2799] [time-to [ [main 2:時:0f6f06] [modifier 1:9:"=N 9"]]]] quantity Amount (quantity) of a thing, action, or change Ex. 3kgのりんご [ [main 4:りんご:3bd8db] [quantity [ [main 2:kg:3c0285] [number 1:3:"=N 3"]]]] Ex. 3kg痩せる [ [main 3:痩せ:3c049e] [quantity [ [main 2:kg:3c0285] [number 1:3:"=N 3"]]]] modifier Modification Ex. 机の上の本 [ [main 5:本:0e5097] [modifier [ [main 3:上:0e5797] [modifier 1:机:3d05cf]]]] number Number Ex. 3kg [ [main 2:kg:3c0285] [number 1:3:"=N 3"]] and Coordination between concepts Ex. ローマとナポリに行く [ [main 5:行:1e84a2] [goal [ [main 3:ナポリ:1efc5a] [and 1:ローマ:10e979] [attribute focus]]]] Ex. 山は美しく水は澄んでいる [ [main [ [main 8:澄:0f8f10] [a-object 6:水:3bd634]]] [and [ [main 3:美し:1e84c3] [a-object 1:山:3ce994]]]] 9-19 EDR ************************************************************ JAPANESE CORPUS or Selection between concepts Ex. ローマかナポリに行く [ [main 5:行:1e84a2] [goal [ [main 3:ナポリ:1efc5a] [or 1:ローマ:10e979] [attribute focus]]]] Ex. 学校に行くか図書館に行く [ [main [ [main 8:行:0f8f10] [goal 6:図書館:100648]]] [or [ [main 3:行:0f8f10] [goal 1:学校:3cf8b1]]]] condition Condition of an occurrence or fact Ex. 雨が降ったので家に帰った [ [main [ [main 9:帰:0e8e45] [goal 7:家:0e5cdb]]] [condition 1:雨:3bba1f]] purpose Purpose or reason for an action or occurrence Ex. 映画を見に行く [ [main 5:行:1e84a2] [purpose [ [main 3:見:1e8646] [object 1:映画:3be65c]]]] cooccurrence Simultaneous occurrence of events or actions Ex. 家に帰る間中泣いていた [ [main 7:泣:0f4cf1] [cooccurrence [ [main 3:帰:0e8e45] [goal 1:家:0e5cdb]]]] sequence Sequential occurrence of events or actions Ex. 図書館へ行って本を借りた [ [main 8:借り:0e97a9] [object 6:本:0e5097] [sequence [ [main 3:行:0f8f10] [goal 1:図書館:100648]]]] Pseudorelation Labels ________________________________________________________________________________ possessor Possession or ownership Ex.父の本 [ [main 3:本:0e5097] [possessor 1:父:0e7c00]] beneficiary Beneficiary (receiver) of an action or occurrence Ex.父に買ってあげる [ [main 3:買:1e84f1] [beneficiary 1:父:0e7c00]] unit Unit Ex.1ダース当り500円 [ [main 5:円:0e6912] [number 4:500:"=N 500"] [unit [ [main 2:ダース:3bf083] 9-20 EDR ************************************************************ JAPANESE CORPUS [number 1:1:"=N 1"]]]] from-to Range of items specified Ex. 大阪から東京までの都市 [ [main 6:都市:3cfc38] [modifier [ [main 3:東京:0ffee3] [from-to 1:大阪:0e7107]]]] ________________________________________________________________________________ 9-21 EDR ************************************************************ JAPANESE CORPUS Table 9-3 Labels Used to Represent Attributes of Facts or Occurrences ________________________________________________________________________________ not Negation of state or condition Ex. 働かない [ [main 4:働:0e2799] [attribute not]] generic General (non-specific) Ex. りんごが好きだ [ [main 1:好き:3cee21] [a-object [ [main 4:りんご:3bd8db] [attribute generic]]]] all Quantification (totality) Ex. すべてのりんご [ [main 3:りんご:3bd8db] [attribute all]] some Quantification (partial) Ex. あるりんご [ [main 3:りんご:3bd8db] [attribute some]] each Quantification (numeration) Ex. 各々のりんご [ [main 3:aりんご:3bd8db] [attribute each]] this Demonstration (near) Ex. そのりんご [ [main 3:りんご:3bd8db] [attribute this]] that Demonstration (far) Ex. あのりんご [ [main 3:りんご:3bd8db] [attribute that]] specific Specific instance ________________________________________________________________________________ Table 9-4 Labels Used to Indicate References Made to Time Time_Attribute_Label_____________________________________________________ past Time made reference to is the past present Time made reference to is the present future_________________Time_made_reference_to_is_the_future______________ 9-22 EDR ************************************************************ JAPANESE CORPUS Table 9-5 Label Used to Indicate Aspect _________________________________________________________________________ Aspect_Labels_#1_________________________________________________________ begin Indicates an event or action starts end Indicates an event or action has ended progress Indicates an event or action is not yet completed continue Indicates a repetitious action/movement continues state Indicates the result (resulting state) of an ac- _______________________tion_is_continuous________________________________ Aspect_Labels_#2_________________________________________________________ yet Indicates a pending state (the event has neither started nor ended) already Indicates a state/event has either started or end- ed soon Indicates a state/event will either start of end soon just Indicates a state/event has just started or ended complete Indicates the completion of an event come Indicates an event approaches the speaker's point of reference go Indicates an event gets further away from the _______________________speaker's_point_of_reference______________________ 9-23 EDR ************************************************************ JAPANESE CORPUS Table 9-6 Aspect in Japanese ________________________________________________________________________________ "Not Yet" Aspect 〜する Ex.: 掃除する [ [main 1:掃除:3ce70c] [attribute begin yet]] Pre-Inchoative Aspect 〜するところ、〜しようと、〜しかけ、〜しそう Ex.: 掃除するところ [ [main 1:掃除:3ce70c] [attribute begin yet soon]] Inchoative Aspect 〜し始め、〜し始ま、〜し出 Ex.: 掃除し始め [ [main 1:掃除:3ce70c] [attribute begin]] Ingressive Aspect 〜し始めた[ところ/ばかり] Ex.: 掃除し始めたところ [ [main 1:掃除:3ce70c] [attribute begin just already]] Durative Aspect 〜してい、〜中、〜し続け、〜つつある、〜ていく[使用例: 船は次第 に傾いていく、花が萎んでいく]、〜てくる、てきた Ex.: 掃除してい [ [main 1:掃除:3ce70c] [attribute continue]] Pre-Terminative Aspect 〜し終えるところ、〜し終えようと、〜し終えかけ、〜し終え そう Ex.: 掃除し終えるところ [ [main 1:掃除:3ce70c] [attribute soon end yet]] Terminative Aspect 〜した、〜してしま Ex.: 掃除した [ [main 1:掃除:3ce70c] [attribute already end]] Effective Aspect 〜し終え、〜し終わ、〜し上げ、〜し上が、〜し切、〜し尽 く、〜し通、〜し抜、〜し果た Ex.: 掃除し終え [ [main 1:掃除:3ce70c] [attribute complete]] Post-Terminative Aspect 〜した[ところ/ばかり] Ex.: 掃除したところ [ [main 1:掃除:3ce70c] [attribute just end already]] Resultative Aspect 〜している、〜してある、〜ておく[使用例: 遊ばせておく、 準備しておく] Ex.: 掃除している [ [main 1:掃除:3ce70c] [attribute state]] ________________________________________________________________________________ 9-24 EDR ************************************************************ JAPANESE CORPUS Table 9-7 Constituent Attribute Labels Used to Indicate Speaker's Intention ___________________________________________________________________________ emphasis Emphasis focus Focus topic Theme or subject wh_______________________Indecision________________________________________ 9-25 EDR ************************************************************ JAPANESE CORPUS Table 9-8 Sentence Attribute Labels Used to Indicate Speaker's Intention or Decision ________________________________________________________________________________ Labels_Indicating_Speaker's_Request_or_Consent__________________________________ imperative Commands (〜せよ) grant Permission (〜してもよい, 〜で構わない, 〜ても構わない) consent Consent (〜してもよい, 〜で構わない, 〜ても構わない) grant-not Prohibition (〜してはならない) advise Advisory (〜することだ) recommend Recommendation (〜したほうがよい, 〜することがよい) invite Invitation (〜しよう) require-agreement Confirmation (〜ですね) ________________________________________________________________________________ Labels_Indicating_Sentence_Style_or_Speaker's_Manner_of_Speaking________________ polite Polite request (どうぞ, 〜してください) respect Respect (〜れる, 〜られる, 敬語(尊敬語)) ________________________________________________________________________________ Labels Indicating Speaker's Decision Towards Facts Expressed in Concept Relation_Representation_________________________________________________________ should Obligation (〜するべきだ) sufficiency Sufficiency (〜だけでよい, 〜ればよい, 〜のみでよい) duty Duty or obligation (〜しなくてはならない, 〜しなければならな い, 〜しなければいけない, 〜せねばならない, 〜することになっ ている) interrogation Question (〜か) conclude Conclusion (〜したのだ, 〜したのである) sure Conclusion drawn from facts (〜するに違いない, 〜するはずだ) maybe Possibility (〜するかもしれない, 〜することがある) seem Conjecture or supposition (〜するだろう, 〜そうだ, 〜ようだ, 〜らしい) rumor Hearsay (〜するらしい, 〜だそうだ, 〜ということだ, 〜とのこ とだ, 〜という, 〜ようだ) appearance Condition and comparison (〜にみえる, 〜のようだ, 〜ようにみ える, 〜みたい) be-sorry Regret (〜したかったのに) natural-result Natural or expected conclusion (〜するわけだ) natural-thing Ideal shape or form (〜ものだ) if Supposition of something unknown thought Supposition of the opposite of fact/reality (もし〜だったら) reality Fact or taken as fact ________________________________________________________________________________ Labels_Indicating_Speaker's_Intention_of_Speaker's_Utterance____________________ exclamation Admiration/disbelief (〜!) pity Regret (太郎が両親に死なれた) blame Criticism unexpected Not anticipated (have + verb (past participle), unexpected- ly) underestimate_______Underestimation_(〜に過ぎない,_〜しただけである)____________ 9-26 EDR ************************************************************ JAPANESE CORPUS Table 9-9 Special Headconcepts Used to Indicate Intention, Feeling or Judgment of an Utterance _______________________________________________________________________________ Labels_of_Decision_____________________________________________________________ c#ability Ability (Based on innate (〜できる, 〜することができる, capability) 〜することが可能である, 〜する ことが可能となる) c#difficulty Difficulty (〜しにくい, 〜しがたい, 〜す ることが難しい, 〜することが困 難である) c#easiness Easiness (〜しやすい, 〜することが容易 である) c#excess Excessiveness (〜しすぎる) c#need Necessity (〜が必要, 〜がいる, 〜する必 要がある, 〜する必要となる) c#only Distinctiveness (〜しかない) c#possibility Possibility (Based on si- (〜できる, 〜しえる, 〜しうる, tuation or circumstance) 〜することができる, 〜すること が可能である, 〜することが可能 となる) c#shortage Insufficiency (〜したりない) c#tendency Tendency (High frequency (〜しがちだ, 〜しやすい, 〜す of occurrence) る傾向がある) _______________________________________________________________________________ Labels_of_Subjective_Feelings_or_Intention_____________________________________ c#unwill-duty Recognition (Situation in (〜せざるをえない, 〜するのは which one cannot avoid やむを得ない) doing something one finds disagreeable) c#voluntary Volitional (Having an al- (〜せずにいられない) most uncontrollably strong desire to realize an action) c#want Desire (Having a desire (〜したい, 〜することを欲する) for the realization of an action) c#will Inclination or intention (〜つもりだ, 〜しようと思う, (Having the intention to 〜するつもりでいる) realize an action) _______________________________________________________________________________ Labels_of_Past_Actions_or_Events_______________________________________________ c#experience Experience (Event oc- (〜したこと/経験がある) curred at a point in time before the utterance) c#failure Failure (Attempt at some- (〜し損なう, 〜し損じる) thing was not realized) 9-27 EDR ************************************************************ JAPANESE CORPUS c#missing-a-chance Lost opportunity (Some- (〜しそびれる) thing one intends to do remains unrealized due to lack of opportunity) _______________________________________________________________________________ Labels_of_Future_Actions_or_Events_____________________________________________ c#effort Effort (One will put for (〜そうとする, 〜する努力をす effort to bring about the る) realization of an action) Note: realization is un- clear c#schedule Schedule (One has a plan (〜ことになっている, 〜予定だ, to realize an action at a 〜予定である(いる)) point in time after the utterance) c#try Attempt (One will attempt (〜してみる, 〜することを試み an action) Note: eventual る, 試しに〜する) realization is unclear _______________________________________________________________________________ Labels_of_Order/Command_and_Request____________________________________________ c#causative Causation (〜させる, 〜さす) c#give-benefit Benefit (Actions of (〜してあげる) speaker benefits other) c#receive-benefit Benefit (Actions of oth- (〜してもらう) ers benefits speaker; Ac- tion of others directed by request of speaker) c#request Polite' Command (〜してほしい, 〜することを希 _________________________________________________望する,_〜することを望む)_____ 9-28 EDR ************************************************************ JAPANESE CORPUS Table 9-10 Other Special Concepts and Headconcepts ________________________________________________________________________________ Labels_Corresponding_to_Pronouns,_etc.__________________________________________ Substitute_Concept_Label______________________________________________Referent__ c#I First person singular c#we First person plural c#you-s Second person singular c#you-p Second person plural Person c#you Second person (number not specified) c#she Third person (gender female) singular c#he Third person (male gender) plural c#they___________Third_person_plural____________________________________________ c#it_____________Thing/matter_________________________________________Thing_____ ________________________________________________________________________________ Labels_for_Specifying/Restricting_Concepts______________________________________ c#all all c#some some c#each___________each___________________________________________________________ ________________________________________________________________________________ Special_Labels__________________________________________________________________ c#statement Special headconcept notation indicating a sentence and provides information about the structure of a piece of writing c#nil Special notation used to represent a blank head- _________________concept________________________________________________________ ________________________________________________________________________________ Labels_Used_For_Representing_Comparison_________________________________________ c#equal Equality c#just-as Similarity c#least Least c#less Less (does not include cases in which numerical value is the same; indicates insufficiency) c#more More (does not include cases in which numerical value is the same) c#most___________Most___________________________________________________________ ________________________________________________________________________________ Labels_Used_For_Representing_Numerical_Values___________________________________ c#no-more No more (includes cases in which the value is the same) c#no-less No less (includes cases in which the value is the same) c#about__________Approximation__________________________________________________ 9-29 EDR ************************************************************ JAPANESE CORPUS ________________________________________________________________________________ Labels_Given_To_Classifiers_____________________________________________________ c#frequency Frequency c#ordinal Order c#tuple__________Group__________________________________________________________ 9-30