starting just before 2pm 2022/04/24; getting to know the elan file
in the file tca_20170527_disc_004_en.eaf
last time slot/value
(time unit is ms, div/60000)
final annotation's id: a945, referring to a292
ANNOTATION_ID for the original text becomes the ANNOTATION_REF for the related anotations on different tiers
the original text for it will be under ANNOTATION > ALIGNABLE_ANNOTATION > ANNOTATION_VALUE
the related ones with be under ANNOTATION > REF_ANNOTATION > ANNOTATION_VALUE
alignable annotations have parameters TIME_SLOT_REF1 and TIME_SLOT_REF2 that refer to the start and end times
the numbering of annotations, while not arbitrary, kinda sucks: it's based on basically as though you were reading the file like a text. start on the first tier, number till the end of that tier; go to the next tier below that and continue incrementing until that tier has been exhausted, so on and so forth
{so i'm curious how this will be affected upon an edit--will it shove it in and displace everything] or give it a new id?}
{also, what would my ideal solution to this look like?}
adding new annotation:
I31e3?
eh
pa2 A3, cincuenta tSo317#5 t+317#3 na1cambia
inserted the timestamp ID to be between the surrounding ones and incremented everything down the line from there (NB: if an annotation starts before another one has finished, that new timestamp will go between the opening and closing of the first annotation, because duh that's how time works: elan just numbers the time slots by literally time)
HOWEVER. annotation IDs seem to be strictly numbered based on when they were added. I'd like to go through the file and see if there are other examples besides my "eh" (ID a946)
it looks like all of the ALIGNABLE_ANNOTATIONs and REF_ANNOTATIONs are exactly 3 levels deep, which is very helpful
(test by looking for "^ <(REF|ALIGN)" and changing the amount of whitespace; obv not exactly how i did it and kinda inefficient but easy to explain this way)
kk so i just did a back-and-forth using https://www.convertjson.com/json-to-xml.htm going from the eaf xml to json back into the eaf xml, and pulled the eaf back up in elan. worked perfectly fine, as far as i can tell! which gives me a tentative green light to work using the json
2.5 hrs in, i think i have a good idea of how the EAFs work
Oh shoot, what happens when I DELETE an annotation and add a new one--will the new one have an ID +1 of the previous or will it overtake? bc if the former, annotation IDs might be pretty solid
now, i know that amalia said she'd like it to be linked to timestamps, but that won't work well i don't think, so let's check this out first
so it's the eh, a946, that i added on TAA-tca, that's the one i'm going to delete
IT WORKS
THE NEW ONE I ADDED BECAME A947 AND THERE IS NO A946 NOW
i would like to write a program that just like, displays the tree's structure? like
word >
item
morphemes
item
morphemes
item
but maximal and minimal of these (like how many items do we get at most? are there other fields excluded in this that we get elsewhere?)
stopped working at 5 btw
(starting again at 5:15)
now looking at the flextext
we should actually make the IDs based on the speaker/tier name in addition to the annotation ID, so that we can assume ourselves that it will go to the right tier, right? ugh idk though hopefully that won't be necessary
moving on, flex
paragraphs are numbered, sentences are numbered as Para.Sentence
Aside: these cats are very cute
there are surely defined punctuation marks to say when a sentence ends (in my first nonce example i used question marks in a couple places and thatś where it cut it off for each senence)
it is already treating + as a word-forming character
two annotations in a row shouldn't be treated as one, right? i m pretty sure we'll need to make sure we separate them even if they're adjacent, and i think to make it flex friendly it should be done with a period or some sentence-ending character, so we don't have to make it go to another paragraph
flex seems to carry timestamps within it, in the form of
phrases > phrase @begin-time-offset and @end-time-offset
so that's nice, i think. but it really sucks that i can stick that in manually even if just to check it out
2pm 2022/04/25
maybe lets write something that will list out all of the classes?
All of the text itself is located in root > interlinear-text ([0]) > paragraphs ([1])
docs for the Element object's methods and attribs: https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element
literally just spending time trying to figure out how to represent the structure, even mentally it's so hard to grok bc looking at the code it's like, deeeeeep nesting (which is ultimately understandable): text[1][0][0][0][1][0][1][0][0] is where the first text is located
text[1][0][0][0][1][0][1][0][0]
text[1][0][0][0][1][0][0]
flextext structure things?
a lot of these are just of tag "item"
Importing the .flextext of tca_20180714_hcg_ahs_haldi.flextext into ELAN, there's an option for "import participant information from 'Note' field"
okay so when importing a flextext into ELAN it's really weird and they force in like a million tiers that are just empty.
hey so this is weird--tca_20180714_hcg_ahs_haldi.flextext doesn't actually have els for some reason?
lets just do a hella nested, hella inefficient crawl through a file and see all unique "type"s on - els
actually wasn't awful to run
{
'cf',
'gls',
'hn',
'msa',
'note',
'pos',
'punct',
'segnum',
'title',
'txt',
'variantTypes'
}
all unique element tags:
{
'document',
'interlinear-text',
'item',
'language',
'languages',
'media',
'media-files',
'morph',
'morphemes',
'paragraph',
'paragraphs',
'phrase',
'phrases',
'word',
'words'
}
just a note (2022/04/30, started at 3, break 6:15-6:45, break again 7:30-7:50)--today i'm trying to write an a JSON from an ELAN XML into a new, completely identical XML (or at least as identical as possible, such that I can pull it back up in ELAN, so that I can get a handle on just sending JSONs into XMLs, then I can be confident in making a way to go to FLEx). Is that a waste of time? I hope not. It involves getting everything in the JSON into XML anyways, so re-sorting it into the way FLEx wants it will be more a matter of reorganization than starting over
i wonder if i'm missing a way to get an XML into a dict though actually
yeah actually xmltodict is less annoying than i thought when i first gave it a go; just .parse(file) gives a very simple dict object, and unparse(object) gives a very simple XML string!
btw--referr to tca_20180714_hcg_ahs_haldi_original-formatting.eaf to see what we're going to be importing to FLEx i think? -reformatted might be fine but like, that has word-level separation which will happen immediately upon import
in fact, that's somehting to consider--we're gonna get that for free, but will the way FLEx does fuck with things we want? should we explicitly delineate words to avoid that? will FLEx want to override that anyways? I can't think of a reason we'd want to override tbh tho
task: gotta figure out correspondences ELAN ~ FLEx
perhaps something we should maintain in our report is the unique element tags--
FLEx
{
'document',
'interlinear-text',
'item': {
'cf',
'gls',
'hn',
'msa',
'note',
'pos',
'punct',
'segnum',
'title',
'txt',
'variantTypes'
},
'language',
'languages',
'media',
'media-files',
'morph',
'morphemes',
'paragraph',
'paragraphs',
'phrase',
'phrases',
'word',
'words'
}
ELAN
{
'ALIGNABLE_ANNOTATION',
'ANNOTATION',
'ANNOTATION_DOCUMENT',
'ANNOTATION_VALUE',
'CONSTRAINT',
'HEADER',
'LINGUISTIC_TYPE',
'MEDIA_DESCRIPTOR',
'PROPERTY',
'REF_ANNOTATION',
'TIER',
'TIME_ORDER',
'TIME_SLOT'
}
okay so one thing to look at: there's a "language" (given as the type attr in item els) called 'cf' and it's identical to 'txt'
at least i think. we should check to see if it's always going to be identical because if so, there's no need to worry about it, we can just duplicate it.
so we gotta check per right
2022/05/01 (starting at 1)
cf means CITATION FORM !!! maybe we can just ask amalia if any citation forms are gonna be different from txt forms (bc intuition says probably if things are being transcribed in special ways (like if all Spanish words have the citation form ))
YUP
- apenas
- SpanishAdverb
- SpanishAdverb
- adv
frick
but okay so that's for the FLEx -> ELAN part of the pipeline anyways, since that's post-glossing
there are fields that exist in the maximally filled flextext that we can ignore since we're going in unglossed (like what we are given in the Example Texts folder)
2022/05/02
ALIGNABLE_ANNOTATION
2022/05/04 ~1:30
test out .findall to see if we can find all nodes where the text == "SpanishAdverb", and retrieve the txt element above it
findall only goes for direct children
oh but i can use XPath syntax (unfortunately rn i'm unfamiliar with it, but it seems like it'll help a good bit so i'll learn (about) it)
going from ELAN to FLEx!! okay!!
let's first get the time slots figured out!
let's take all of the related tiers (alignable, then all relevant references) and group them in a dict so we can put it into flex items
there's an - w/ attr @type="segnum" with the number of the phrase in FLEx
eugh i think these are associated with words
truly, a major question here is to what extent will FLEx automatically convert stuff from an import file to conform? do i need to add puncuation tags and word tiers? cuz like it's gonna separate it on its own, right? so what's the big idea! (what i mean is: should i even bother doing it explicitly?)
another thing to look for: if i add in fun lil attributes (like "note type" ["note note"/"speaker"]), will it mercilessly annihilate them?
also what if i manually manip'd the guids?
OKAY SO!!!
changing the guids does *not* work at all
the new attributes are totally acceptable tho--let's see if we still get it when it comes out!
uhhh pulling it out, all the guids just...changed?
and the attribs go bye-bye
also it duplicates all the txt tiers? idk why and idk where it actually happens though lol
it will be the PARTICIPANT attribute in the ANNOTATOR tier (for those els without a PARENT_REF attr; ie only taking the PARTICIPANT attrs from the parent tiers, which unfortunately are unmarked so we'll have to look for them by making sure there is not PARENT_REF attr)
dude FLEx just stores texts by word. like, no baseline tier.
okays so in FLEx, you of course have the option of doing Paragraphs and then Phrases, but at least in the tca db, each para only has one phrase (so it's not 1.1, 1.2, 2.1...; rather just 1, 2, 3...)
2022/05/07 ~1:30
So like. The guids are really gonna be what give me grief in the ELAN -> FLEx portion of this
cuz it seems like i can't really just go edit them willy nilly? i'll just have to keep trying
OMG I THINK IT'S BECAUSE THEY'RE ALL HEX CHARACTERS!!!!
YESYESYESYESYESYFUCK YES IT WORKED
the guids have to be hex code 229d8a92-bc98-fdae-6593-ebc9294bccea
in the form of:
[0-f]{8}-([0-f]{4}-){3}[0-f]{12}
what are the things that have guids?
look through every node and add the type (word, paragraph, etc) to a set if it has the guid attrib
{'interlinear-text', 'media', 'paragraph', 'phrase', 'word'}
We'll probably need to keep a track of guids so we don't reuse?
how should we even go about assigning guids lol
one issue will be that any new word delimiting will make there be a new guid
frick
anyways, gonna do hex incrementation
in python you can say that an int is to be treated as one of any base
val = int("9a", base=16)
okay so let's just take the whole 32 digit thing as one integer, increment it, and hyphens into the string rendition
we'll need to associate words' guids with the elan annotation IDs somehow i think?
but then again, ELAN annotations IDs will be assigned upon re-entry
literally was taking a bite of spaghetti and paused when i realized: i can't just put in the annotations from ELAN into FLEx in the order they appear, since it's turns being taken, but ELAN stores it in tiers and not time; FLEx wants it as time
i guess...go by time?
and now this means that the speaker metadata *must* be in each annotation dict
so let's go very manually, tier by tier; per alignable+symbolics
so like
for i in alignable tiers:
do alignable stuff on i and put it in dict
for j in symbolic:
put j into subdicts in i
if within the tier you see an alignable, use that
hey is there a reference tier attr in the symb-linked tiers?
make end user tokenize on their own, assume it's been done
sometimes ppl migth have multiple phrases/para, figure it out
look at the symb ref and use the annotation ID in a note field on the phonetic and target phrases in flex so it references aN in both and we can use that to collapse it when we go back back into ELAN (do consider though, maybe we can go FLEx -> CSV so the transformations aren't necessarily a thing and we go from dicts to csv)
XDS tier only for adults but watch out for it (include in notes) (it just says X is the addressee, not the actual speech)
if you mess w/ baseline while in FLEx, ur gonna get frickt (give clear warning)
2022/05/11
talked to Amalia:
we do need to fully go back into ELAN because she's got interaction stuff she needs from ELAN's interface
we need to be doing the tokenizing manually because doing it in ELAN is a pain in the ass
{this will mean having a place for a user to input word-forming characters}
oh btw, what we've been calling "debugging for the uses cases of sophie and amalia" is really more like seeing in what ways i need to tune it so i can generalize it
2022/05/15
~3
(make sure to submit hours from the past week, from Wed forward. Include the hours dealing with Heather)
okay so the ANNOTATION nodes only have one child so like maybe we could get rid of that
what if i just stored all the children annotations alone and like brought them for the item[gls] and item[note]
2022/05/19
just making word item with txt tag, not gls or pos
2022/05/20
okay so do know that there is some kind of hash in the eaf headers
urn:nl-mpi-tools-elan-eaf:418c5893-1b74-4e5a-90a1-7dce84a25951
oh boy we don't have a way to put languages in the flextext from the EAF (bc fonts and languages aren't defined in the eaf)
^is what we need in the flextext
2022/05/21
2022/05/22
times:
EAF:
¿Qué es lo que se escuchó Josep?
flextext:
- 1
...
2022/05/27
[6:11 PM, 5/10/2022] Amalia Skilton: No I would like that to happen automatically, it is a hassle if you have a lot of participants
[6:11 PM, 5/10/2022] Sunny Ananthanarayan: When you say automatically, do you mean that this tool should do it or that they will use the ELAN tokenize function?
[6:12 PM, 5/10/2022] Amalia Skilton: I would like this tool to do it as the ELAN tokenize function is cumbersome
[6:12 PM, 5/10/2022] Amalia Skilton: This will require the tool to edit the EAF as well most likely
okay actually maybe i should be doing the word stuff in the ELAN file, saving the ELAN file, and then doing whatever the hell else
alternatively, could just make the word annotation tier separately and pop it into ELAN
wait that's the same thing
but yeah the question is kinda moreso, do we want to have it in the ELAN file ever, bc will we even want it if it doesn't have the glossing etc anyways?
like just wait until it comes out of FLEx into a more full EAF
HEY UHHH ARE WE GONNA WANT AN IMPROVED IMPORT OF FILLED OUT EAFS INTO FLEX SO WE CAN OVERWRITE STUFF AND EDIT IF SOMEONE FINDS AN ERROR OR WANTS TO EXPAND SOMETHING????
2022/05/29
i think probably one of the input things we should ask for is which tiers are notes and which are translations. honestly i might call everything a note at this point
2022/05/30
wait we can have multiple media files in flex
i think the user is going to have to define language/font
uhh waht if we just take the speaker attribute in the phrase and use that to populate the note field for the phrase that would say speaker
there's a lot of trying things and then seeing they don't work. and that takes time and gives no demonstrable results :\
for some reason flex decides to reassign guids? annoying but i guess whatever. preserves the ones for media though? idk
2022/05/31
Ask what language each tier is as input
2022/06/01
sophie's YDN_202001_a_1.eaf has
which is just completely empty
gotta be able to deal with fully empty annotations (above eaf, a539)
2022/06/02
God I need to preserve when something is a target huh
like, per utterance
as a note
but you're gonna do so based on tier...id? maybe?
maybe for now actually put it as an attribute in the note items, even though it prob won't go through when you pull it back out of flex
show each tier and ask language code? ask if it's translation?
you know what. let's make a function, just so it's there, that will let a user manually define the translation (ie type gls) tiers by name
maybe we go through each tier with the user and ask hey is this a translation tier? or present all the child tiers of a tier and ask hey pick 1,2, or 3, which is a translation
Met with Amalia, next tasks:
- Fix the line order bug we just identified
when doing this, the tool goes by annotation order instead of time order. make it go by time order
- Fix the bug that prevented import of a file with "Included In" tiers. This should not be complicated since you can use the info about tier stereotype to exclude them and only look at other non time aligned tiers.
Everyone I know who uses the "included in" stereotype uses it for visual behavior so you do not need to try to design for it at all.
- Add a feature where the user specifies which tiers/tier types contain the translations. ok to use tier type and allow only one translation for now.
Prioritize fixing the line order bug and the translation issue
2022/06/07
having a flextext validator that diagnoses issues that prevent import would be really nice
okay so target tier
basically i have to tokenize then like baseline, but the only note should be the annotation id (aN) to which it refers
and actually i'll need to add that to the baseline ones too
i am to ignore all of the tiers w/o "PARTICIPANT" attrib, according to Sophie. This is something to tell the end user explicitly
okay annotation attribute_id's now exist top_tiers[0]["CHILD_TIERS"][0][0][0][0] and
2022/07/12
refamiliarize myself with the current issue (import of MTO)
for MTO: ignore ayöök, amaaxün, observaciones, and inglés tiers
2022/07/14 start 1pm
oh right do we need to store the parent aN annotation ID in a note for the target+phonetic so we can match them? i think so
- add note on everything with parent annotation ID
- make target utterances their own line
- add note to say if it's a target utterance or phonetic one
so the notes should be: translation, annotation ID, Target/Phonetic, actual notes, speaker, XDS if applicable
1-2pm hour spent talking to sophie about her needs and realizing that for some reason?? i can't actually do analyses? like the asterisks don't appear under the utterances in FLEx!! aaaa!!! and it looks like that's the case with Amalia's stuff??
so yeah just spent an hour mins making adding date+time to output so i can keep track of things better
gonna try it out to make sure it works then get to the priority which is trying to get analysis to work
it works, the hour included confirming that actually yeah it does allow analyis for tca stuff! (and the thing directly below popped up)
noticing that only the spanish translation is showing up in the tca stuff (as English translations, mind you)--maybe I should add an extra thing specific to tca saying that the -abs translation tiers are Spanish, and the translation tiers without -abs are English? do we want both translations?
for now the decision I'm making is to only include the Spanish translation, as it does right now
1 nicpin tlaniztli
a20
I have a question
haplology?
A (To an adult)
Phonetic
2 nicpya tlatlaniliztli
a20
Target
20 min to clarify ^that with Sophie, 3:20 moving to target.ipynb bc i'd love to avoid the big problem of analysis being impossible and i need to do this anyways
20min break 3:40 (20-40 was working on figuring out target utterances solutions)
4pm back to target utterances issue
5pm checking in, just continuing. added "Phonetic/Target" tier as well as aID (although haven't checked if successful)
but yeah still at it
eventually make sure the speaker shows up in the flex export, just in case it's an asshole
got the target utterance thing (+aID showing up) to work! 6pm
guess it's time to figure out why i can't analyze the mto stuff
so it looks like it takes any token with ' and makes the entire thing of type punct...
i think we're gonna need to redefine those things in the flibl stuff
quittin at 7
popped in at 12am to update Amalia and realized that I could just take care of adding a note for participant name, so i went ahead and did that
2022/07/15 started ~3:30pm
trying to figure out the issue with sophie's db!
- ü
line 373 in flibl out of YDN202001_a_1, still thinks things are punctuation
happens...most? of the time?
utterance 75 on 2022_07_15-00_26
utterance 10 on 2022_07_15-16_40
glottal still not respected as word-forming
Charis SIL is used for child lg, use a different LANGUAGE AND FONT. figure out how to notice this in ELAN and carry that over
Sophie rightly pointed out that we are hardcoding a lot of things. We should figure out all of these things and figure out ways we can let people use their own formatting to get what they want/need from it
apparenlty the mto script takes the last word and repeats it in a note??
spent ~2hrs getting the analyze things to show, debugging until figured out the font issue
spent ~1hr between amalia and sophie to talk about present problems
[8:44 PM, 7/15/2022] Amalia Skilton: User specs: While in principle I like the user interaction it is annoying to do when tier names already show which tiers are translation vs not. So I would suggest adding a few commented lines to the script that change the behavior from interacting with the user to get translation tiers to assuming for example that all tiers of the type ‘tns’ are translation tiers. Just make explicit what to comment/un comment to change the behavior
[8:45 PM, 7/15/2022] Amalia Skilton: Tiers to exclude: do not spend time creating specs for this, deleting tiers is easy if user wants to do that
[8:46 PM, 7/15/2022] Amalia Skilton: Font and language setup: this is great info to go into a read me file that can be distributed with the script
[8:47 PM, 7/15/2022] Amalia Skilton: The readme can say something like ‘look in xx place in flex to find your font and then change line yy of the script to the name of that font’
[8:48 PM, 7/15/2022] Amalia Skilton: Tiers with target utterances: I’d say default should be user interaction to identify these but include an option for specifying by tier name or type like with the translation tiers n
[8:49 PM, 7/15/2022] Amalia Skilton: Treat tiers as different languages: Yes I know Sophie wants to do this and I disagree with it. perhaps my opinion will change once I begin using flex more with the child language data. refer to her for her specs
2022/07/16
starting at 4pm to fix encoding spacing issue
took until 6 to finish fixing!
for some reason this is giving two phrases in a paragraph, the first of which has no notes or translation
105(.1/2) has this issue
117 is extra interesting because it has the same issue PLUS it's got a target utterance, so it has .1/2/3
this punctuation stuff starts as early as line 8
the first target (or at least and early one) is at 21
- 120
- Të
- yë
- ve'e
- xtsa'pxnü
- Ös
- ?
- -
- tsyëjx
- ?
- -
- ¿Ya lo rompiste Osvaldo?
- C
- -
- Phonetic
- a3170
- BCA
it's 7, i spent an hour trying to figure out what was going on with FLEx's punctuation nonsense and i still don't know. i am going to do the thing to make it so people can specify what language they want to use
it's 9, it works with Sophie's one language translation thing, but for amalia's it only takes in the spanish translation tiers. it does successfully take all of the spanish translation tiers, but not the english ones for some reason
oh so since i'm making a with all the attributes and 'tns' is the type used for both English and Spanish translations, it's just replacing the english one when it finds the spanish one
might do a list of tiers instead of a dict bc i never call a tier by name, so i could just use indices? ehhhh but i might actually use names, i should check to make sure
9:45 got the "target language as separate language" to work, quitting
2022/07/17 starting around 4
the punctuation issue can be fixed by putting all consecutive punctuation into a single element
ëtë ya'iy... ëts xa yë njä'ä...
no se entiende
2022/07/18 starting at 2:30
well here's the thing. if the punctuation thing is gonna be an issue, we do know for sure that every time there's a phon/targ break w/in a para, there's going to be a section character (§) so we can work with that when going to ELAN
but yeah line 117 is a major example of this issue
144 has a weird saltillo issue going on, doesn't have asterisks below it
oh since i made the language cps for the child speech, i need to make sure the font is set up with all the special characters
we should have a place where the user defines the langs and fonts so it can be used throughout, instead of only defining it at the bottom
worked for like 1 hours, didn't make actual progress
2022/07/19 starting at 2:15
current issues:
reading child speech as not analyzable
punctuation line breaks
it's calling the first line of phonetic speech mto instead of the second, target utterance
define language and font
write guide
guide things
code V for directed to Vijay, code R for directed to Ravi; for speaker it's full name.
that's the system we have set up in our db, could be different in yours, but it needs to be discernable to *you* which is being used because we don't have a special indicator for which is target and which is speaker (but can add in the future; might be annoying to users though, will be to sophie from our conversations)
āj mere pās buildings hai
property hai
bank balance hai
banglā hai
ghar hai§gāṛī hai
kyā hai tumhāre pās?
mere pās mā hai
to add to word-forming: āīṛ
woah okay so when it's . or .. it breaks, but when it's ... it doesn't
... is considered punctuation that doesn't break up a phrase. Which i guess makes sense
Amalia says she puts commas next to her hyphens sometimes so it doesn't do that. it's a known issue though yeah
I might need to hardcode something to counteract the behavior in order to make it more intuitive, but it is inferrable.
If there is § anywhere in the utterance, we know that what precedes is phonetic and what follows is adult
also yeah if there's multiple languages in an utterance, the second part of the utterance, the part in a different language, is treated as punctuation. actually, anything that isn't the vernacular in focus is considered punctuation. THAT'S why the asterisks for analysis didn't show up. freakin. wtf.
2022/07/20 just woke up and am doing a thing with sophie
i think what i'm going to do is have each target/phonetic be its own paragraph
ask sophie if she wants to have the utterance metadata in the target one too then?
multiple punctuation marks: this will create multiple phrases in the same paragraphs until we find a solution
would you be okay with me fully removing instances of multiple punctuation marks? (probably not, so we'll have to think through what we can do; one option could be trying to recover it by looking at the original ELAN file, but that would either be kinda complicated or involve taking the original utterance text which might have been mutated while analyzing)
keep all punctuation
add in "Addressee:" "Speaker:" (maybe aID: but also that's just a\d+) "Phon/Targ:"
this is for the export, and it will involve the user telling us which tiers are addressee/speaker/etc
switch target and language for the ones w/ both
docs writing to get first draft took about 3hrs, with having to get input that adds at least 2 hrs
3.1 to add commenting
the 1.5-tca script does work. 3.x doesn't
3.1 was comments. 3.2 will be for that tca issue
pulling back up 2022/08/01 off claire's clock, also this is me retroactively stating that i did spend an hour or so talking to sophie about this issue where target is no longer appearing as its own utterance
first thing i'm going to do is use Deewar to test things, instead of just mto or just tca. (issue fixed by flattening the list as being the tier ids instead of the tier type)
in the ELAN import, there's an option: "Smallest time-alignabe element" either "phrase" or "word"
first try is using phrase; also "create new tier type for each item type" with "create new tier type for new item language" checked
for flex -> eaf
shttps://josephlovestrand.wordpress.com/2021/08/31/flex-%E2%86%92-elan-%E2%86%92-burned-subtitles/
would love to check out https://zenodo.org/record/6548993
next directions:
tweaks to flibl eaf -> flextext, esp for generalizability and directions/instructions
flextext -> eaf
writeup on process (will need some guidance on what that will mean)
js implementation
633-634, a4111, myanajxnü
cps doesn't glom, mto does
256 has a separate comma
a3410:
no target, doesn't glom