===============
Guide to transcriptions of child-caregiver interaction
By: Amalia Skilton
Date: March 14, 2022
===============

This document covers information needed to access and interpret the EAF files in the directory Child_Caregiver_Ineraction.

It has the following sections:

A. Accessing the transcriptions and linked media files
B. Tier types
C. Tier names
D. Orthography and abbreviations used in "tca" and "tns" tiers

===============
A. Accessing the transcriptions and linked media files
===============

Each ELAN file is linked to at least three media files - two video files from cameras at opposing angles, and at least one separate audio track.

To view the transcriptions with all linked media, follow these steps.

1. Download the transcription file (EAF) that you are interested in.

2. From the folder containing the transcription, download the media files (720p video, audio) and the offsets.txt file.

3. If you prefer to consult original resolution video instead of 720p video, open the Box Note titled 'Video Download Link' in the folder containing the transcription. Navigate to the linked folder in the California Language Archive (CLA) and download the video files indicated in the note.
	- If you want to listen to additional audio tracks, they can also be downloaded from the CLA using this procedure.

4. Open the EAF. On opening the transcript, ELAN will prompt you to locate the linked files on your hard drive. Point ELAN to the files which you downloaded in step 3. 

5. Open the Linked Files pane in ELAN and check the media offsets against 

===============
B. Tier types
===============

Each EAF file in this collection contains the following three tier types:

1. "tca" (also called "tsc" in some files for early participants) - Contains a phonetic transcription of the participant's turn. See section D of this readme for a key to the orthography and abbreviations used in this tier.

2. "tns" - Contains a translation of the participant's turn into English or Spanish. Nonlinguistic turns, as defined in section D, do not always have annotation on this tier. Spanish translations are by Angel Bitancourt Serra. English translations are by me.

3. "tns-en" - Contains a translation of the participant's turn into English. Defined only if the language of the "tns" tier is Spanish.

4. "notes" - Contains on the transcription and/or translation, written by me.

5. "xds" - Contains a code representing the addressee(s) of the turn - who the participant is speaking to. See section E of this readme for a key to the abbreviations used in this tier. Coding done by me.

Some early EAF files have an additional tier types called "utterance". This tier contains no annotations and was used only to create a parent tier to annotations on other tiers. 

===============
C. Tier names
===============

C.1. Target participant tier names

In a recording with one target child participant and one target caregiver:
- The child's tiers  have the prefix "CHI" and the participant "CHI".
- The caregiver's tiers have the prefix "Caregiver" and the participant "Caregiver."

In a recording with two target children and one caregiver:
- Each child's tiers have the prefix "ChildXX" and the participant "ChildXX", where the XX represents the child's participant number. 
- The caregiver's tiers have the prefix "Caregiver" and the participant "Caregiver."

In a recording with two target children and two caregivers:
- Each child's tiers have the prefix "ChildXX" and the participant "ChildXX", where the XX represents the child's participant number. 
- Each caregiver's tiers have the prefix "ChildXXCaregiver" and the participant "ChildXXCaregiver", where XX represents the participant number of the child they appear to be responsible for on the recording.

C.1. Non-target participant tier names

Participants other than target children and caregivers also appear. 

*Tiers* for non-target participants have a prefix of two letters and one number representing the gender and age of the speaker. Age codes are A = adult/adolescent (anyone whose voice sounds like they are at/past the age of puberty) and C = child. Gender codes are based on voice only and are F = female, M = male, and U = unknown. Following the letters, a number (1-9) uniquely identifies the participant. 

*ELAN participant names* for non-target participants repeat the tier prefix, then - in parentheses - give the participant's kin relationship to the target child(ren), if I know it. The statements of kin relationship use standard kinship abbreviations:

B = brother
Ch = child (kinship relation)
D = daughter
e = older (modifies B and Z)
F = father
H = husband
M = mother
S = son
W = wife
y = younger (modifies B and Z)
Z = sister

Examples of how to read the tier prefixes:
MC1 = male (M) child (C)
FA2 = female (F) adult (A)
UC3 = unknown gender (U) child (C)

Examples of how to read the participant names:
MC1(Childs-eB) = male (M) child (C), is older brother (eB) of target child
FA2(Childs-MZD) = female (F) adult (A), is mother's sister's daughter (MZD) of target child
UC3(Childs-FBCh) = unknown gender (U) child (C), is father's brother's child (FBCh) of target child

In some cases, children who are study participants appear as non-target participants on others' recordings. These non-target participants are labeled according to the scheme for non-target participants, then have their participant number in parentheses. Thus "MC1(ChildsMZS;Child33)" means "male child; is mother's sister's son of target child; is enrolled in the study as participant #33."

Non-target participants who are heard over electronic devices, for example people speaking to participants on the phone, are treated differently for tier and participant naming purposes. For both tier prefixes and ELAN participant names, they are labeled only as EF = electronic female, EM = electronic male, and EC = electronic child.

===============
D. Orthography and abbreviations used in "tca" and "tns" tiers
===============

Orthography for Ticuna on the "tca" tiers follows the ASCII practical orthography described in the orthography file in this directory.

Orthography of Spanish and Portuguese follows the standards for those languages.

The following abbreviations are used to represent nonlinguistic sounds on the "tca" tier:

L = laughing
N = noncanonical babbling/fussing (used only for children)
Y = crying

Turns that consist only of nonlinguistic sounds either are not given a translation on the "tns" tier, or have the same code in the "tns" tier as in the "tca" tier.

The following symbols are used on the "tca" and "tns" tiers:

[] = on all tiers, single square brackets enclose a comment on the following material, e.g. [whispering] indicates that the following words are whispered. They also enclose portions of the turn which are unintelligible, e.g. "pa2 ma3 [unintell] ma3r+3" means that the participant intelligibly says "pa2 ma3", says something unintelligible, and then intelligibly says "ma3r+3".

() = on the "tca" tier, single parentheses enclose material that is unclear on the recording. Note that () in the "tns" tier can either mean this, or indicate that the material inside the () is understood but not overt (e.g. is subject to ellipsis/pro-drop) in the turn.

(()) = on the "tca" tier, double parentheses enclose annotations of visual behavior, e.g. ((nods)), and annotations of nonlexical vocalizations, e.g. ((bilabial trills)). If a nonlexical vocalization has a conventional meaning, it is given on the "tns" tier; for example, ((alveolar click)) on the "tca" tier is often translated "Don't do that" on the "tns" tier.

The following language names are used:
En = English
BP = Brazilian Portuguese
Port = Brazilian Portuguese
Sp = Spanish
Tca = Ticuna

A language name in square brackets, followed by text, indicates that the following text is in the given language. The language designation applies to all text until the end of the turn or the next language code, whichever is first. Thus "[Sp] mira [Tca] d+17ka4" means that the participant says "mira" (Look!) in Spanish, then "d+17ka4" (Look!) in Ticuna.

Turns entirely in languages other than Ticuna always begin with a statement of the language name. Turns entirely in Ticuna do not include language names. If it is unclear which language a turn represents because it consists of a single word which exists in two languages (e.g. Spanish mam'a, Ticuna ma3ma5), the turn is not given a language name tag.

===============
E. Abbreviations used in "xds" tiers
===============

In recordings with one target child participant, tiers of the type "xds" use the following abbreviations:

A = adult/adolescent
C = nontarget child
N = nonhuman addressee (e.g. animal)
NA = no addressee (used for nonlinguistic turns = laughing and/or crying)
T = target child
U = addressee unknown

Recordings with two target child participants use the following abbreviations:

A = adult/adolescent
C = nontarget child
N = nonhuman addressee (e.g. animal)
NA = no addressee (used for nonlinguistic turns = laughing and/or crying)
ET = older target child (T not used)
YT = younger target child (T not used)
U = addressee unknown

Turns with more than one type of addressee have the addressee types separated by +.

XDS codes are intended to be an exhaustive list of the *types* of addressees of each turn. For example, a turn is coded as T if addressed only to the target child, T+C if addressed to the target child plus at least one nontarget child. 

The scheme does not distinguish between turns with more than one addressee of the same type. Therefore, for example, turns addressed to one adult (and no one else) and turns addressed to two adults (and no one else) are both coded as A.

Broadcast talk (talk which does not appear to have an addressee, e.g. participants singing to themselves) was treated as addressed to everyone present.