top of page

OTHER TIERS:

Prosodic words, Words, Break Indices, Miscellaneous

 

The Prosodic Words Tier

The Prosodic Words (PrWords) Tier is a phonetic transcription using ASCII characters. This tier facilitates the analysis of sandhi (connected speech phenomena, such as segment assimilations and deletions across word boundaries), and fast speech rules, by encoding their outcome. Like all transcriptions, this tier has its limitations, and is not meant to be a substitute for acoustic analysis; rather, it allows annotators to flag instances of sandhi for more detailed acoustic analysis. The PrWords Tier provides information about stress, since this information cannot be deduced from the transliteration in the Words Tier or derived from a dictionary (e.g., as mentioned, content Greek monosyllables, as well as some function words, are normally stressed and pitch accented in speech, but not in orthography; in contrast, disyllabic function words are orthographically accented, but most do not normally carry stress in speech). In this tier, each prosodic word (defined as a sequence of items showing total cohesion) is transcribed as one label.

 

The Words Tier

This tier can be a transliterated version of the text, equivalent to the Orthographic Tier in MAEToBI; for systems that support Greek fonts (like Praat) transliterations can be replaced by Greek orthography (as in the figures).

 

The Break Index Tier

There are four break indices, 0, 1, 2, and 3. Break indices mark cohesion (or the lack thereof) between constituents in an utterance.

 

  • BI 0 is used to mark total cohesion between orthographic words (e.g., clitics and their hosts). Orthographic words separated by BI 0 constitute a PrWord that may bear only one pitch accent (with the noted exception of hosts and clitics with two accents). Several types of sandhi may occur across a BI 0 boundary, however, sandhi is not necessary for a BI 0 to be used. For example, a proclitic particle like /na/ and the following verb are perceived as one PrWord by native speakers, but no sandhi can occur between /na/ and a consonant-initial verb.

 

  • BI 1 marks boundaries between PrWords. Items separated by BI 1 should carry at most one pitch accent each, although a PrWord need not be accented (e.g., all PrWords following an early focus are de-accented ; Baltazani & Jun, 1999; Botinis, 1998). In general, if an item is accented, then it should be considered a separate PrWord. On the other hand, the absence of accent, as mentioned, does not constitute evidence that an item is not an independent PrWord in Greek.

 

  • BI 2 marks the boundaries of ips (intermediate phrases).

 

  • BI 3 marks the boundaries of IPs (Intonational Phrases).

 

Break Index Tier Diacritics

Four diacritics are used to provide more details on the prosodic structure of utterances.

 

  • s is used with all break indices when there is evidence of sandhi (Figure 2, Figure 6, Figure 7, Figure 9, Figure 10). At present there is no coherent description of the sandhi rules for Greek (see Nespor & Vogel, 1986; Kaisse, 1985; for a different point of view, see Arvaniti, 1991, and results in Baltazani 2006b; for a review of the relevant literature, see Arvaniti 2007). Our corpus confirmed previous studies that used naturally occurring data (e.g. Fallon, 1994) in showing that sandhi can apply across larger constituents than postulated by, e.g., Nespor & Vogel (1986); see Figure 7 and Figure 9 for sandhi across PrWords, Figure 6 for sandhi across an ip boundary. Further annotation of the GRToBI corpus has also shown that some of the sandhi rules of Greek are better described as gradient gestural overlap (Pelekanou, 2000; Arvaniti & Pelekanou 2002; Baltazani 2006b). Since the presence of sandhi does not necessarily signal cohesion, we have decided to use the diacritic s for sandhi at all prosodic levels, and thus provide an easy way of searching the database for such instances. We believe that the sandhi phenomena will be better understood if a large corpus of natural spoken data is investigated.

 

  • m is used for mismatch between the break index and the prosodic or tonal cues to this index. This diacritic should be used with BI 0 to mark cases in which the context for sandhi exists but nevertheless sandhi does not take place. For example if a sequence like /ton 'pono/ "the pain. Acc." is pronounced [ton 'pono] it should be marked as 0m, since in general it should be pronounced [to'mbono] or [to'bono]. The m diacritic should be used with BIs 1, 2, and 3 to mark the presence of a boundary without the tonal events that normally accompany it.

 

  • p should be used to mark pause at a given boundary.

 

  • ? should be used to mark uncertainty about the strength of a boundary (the highest of the two candidates should be marked).

 

The Miscellaneous Tier

This tier allows researchers to annotate non-structural information that may be useful in interpreting the file, such as coughing, disfluency, pitch halving or rate of speech.

bottom of page