Python: module flexible07

flexible07

index
/home/sunny/Documents/lx/flexible/flexible07.py

Modules

xml.etree.ElementTree
json
re

Functions


add_word_el(tokenized_utt: List[str], phrase_el: xml.etree.ElementTree.Element, lg: str)
Populate a phrase element with a tokenized utterance *tokenized_utt* is a list with tokens from an utterance *phrase_el* is the element that will be the parent to the words added (below the words el in the phrase_el will be the items w/ translations and notes) *lg* is the language whose word-forming characters are to be used in tokenization Makes changes in place (returns nothing)

generate_guid()
Generate FLEx guid based on offset defined in offset.txt Increments offset upon use. Return guid in format [0-f]{8}-([0-f]{4}-){3}[0-f]{12}

make_charset_regex(characters: str)
Make the config file character set into the appropriate form for flibl use Parameters:     characters: the string of valid characters in the language Return regex pattern for non-word-forming characters

print_el_info(el: xml.etree.ElementTree.Element)
Print the tag, attributes, text, and number of children of an ET.Element Parameters:     el: Element to be printed

remove_included_in(eaf: xml.etree.ElementTree.Element)
Remove the tiers that use the "Included In" stereotype constraint Parameters:     eaf: The root element of an EAF file Makes changes in place (returns nothing)

time_values(eaf_root: xml.etree.ElementTree.Element)
Get time IDs and values from an EAF file Parameters:     eaf_root: is the root element of an EAF object parsed through ElementTree Return a dictionary of time ID and value pairs

tokenize(phrase: str, lg: str)
Tokenize an utterance based on specified word-forming characters. Parameters:     phrase: the string to be tokenized     lg: the language whose word-forming characters are to be used in tokenization Return list of tokens

Data

List = typing.List
__warningregistry__ = {'version': 0}
config = {'exclude_tier_id': ['ayöök', 'amaaxün', 'observaciones', 'comentarios', 'inglés', 'Interlinear-title-mto', 'Notes'], 'exclude_tier_type': ['Words'], 'file_names': ['./data/YDN202012_b_3.eaf'], 'language_fonts': [{'font': 'Charis SIL', 'lang': 'cps', 'vernacular': 'true'}, {'font': 'Charis SIL', 'lang': 'mto', 'vernacular': 'true'}, {'font': 'Charis SIL', 'lang': 'es'}, {'font': 'Charis SIL', 'lang': 'en'}], 'languages': {'child_language': 'cps', 'flex_language': 'en', 'main_language': 'mto'}, 'target_utterance_tier_type': ['Target Utterance'], 'translation_tiers': {'BCA_Translation-gls-es': 'es', 'Car_Translation-gls-es': 'es', 'Mary_Translation-gls-es': 'es', 'OS_Translation-gls-es': 'es', 'POL_Translation-gls-es': 'es', 'YDN_Translation-gls-es': 'es'}, 'valid_characters': {'child_language': "A-Za-zäëöüÄËÖÜ'`ꞌꞋ'‘’äëöüÄËÖÜ̈áéíóúÁÉÍÓÚáéíóúÁÉÍÓÚ", 'main_language': "A-Za-zäëöüÄËÖÜ'`ꞌꞋ'‘’äëöüÄËÖÜ̈áéíóúÁÉÍÓÚáéíóúÁÉÍÓÚ"}}
word_forming = {'cps': re.compile("([^A-Za-zäëöüÄËÖÜ'`ꞌꞋ'‘’äëöüÄËÖÜ̈áéíóúÁÉÍÓÚáéíóúÁÉÍÓÚ])"), 'mto': re.compile("([^A-Za-zäëöüÄËÖÜ'`ꞌꞋ'‘’äëöüÄËÖÜ̈áéíóúÁÉÍÓÚáéíóúÁÉÍÓÚ])")}

Data
		List = typing.List __warningregistry__ = {'version': 0} config = {'exclude_tier_id': ['ayöök', 'amaaxün', 'observaciones', 'comentarios', 'inglés', 'Interlinear-title-mto', 'Notes'], 'exclude_tier_type': ['Words'], 'file_names': ['./data/YDN202012_b_3.eaf'], 'language_fonts': [{'font': 'Charis SIL', 'lang': 'cps', 'vernacular': 'true'}, {'font': 'Charis SIL', 'lang': 'mto', 'vernacular': 'true'}, {'font': 'Charis SIL', 'lang': 'es'}, {'font': 'Charis SIL', 'lang': 'en'}], 'languages': {'child_language': 'cps', 'flex_language': 'en', 'main_language': 'mto'}, 'target_utterance_tier_type': ['Target Utterance'], 'translation_tiers': {'BCA_Translation-gls-es': 'es', 'Car_Translation-gls-es': 'es', 'Mary_Translation-gls-es': 'es', 'OS_Translation-gls-es': 'es', 'POL_Translation-gls-es': 'es', 'YDN_Translation-gls-es': 'es'}, 'valid_characters': {'child_language': "A-Za-zäëöüÄËÖÜ'`ꞌꞋ'‘’äëöüÄËÖÜ̈áéíóúÁÉÍÓÚáéíóúÁÉÍÓÚ", 'main_language': "A-Za-zäëöüÄËÖÜ'`ꞌꞋ'‘’äëöüÄËÖÜ̈áéíóúÁÉÍÓÚáéíóúÁÉÍÓÚ"}} word_forming = {'cps': re.compile("([^A-Za-zäëöüÄËÖÜ'`ꞌꞋ'‘’äëöüÄËÖÜ̈áéíóúÁÉÍÓÚáéíóúÁÉÍÓÚ])"), 'mto': re.compile("([^A-Za-zäëöüÄËÖÜ'`ꞌꞋ'‘’äëöüÄËÖÜ̈áéíóúÁÉÍÓÚáéíóúÁÉÍÓÚ])")}