Hi everyone this is a recording of the  Chirila Historical Linguistics Lab’s   presentation at the recent Linguistic  Society of America's Annual Meeting   we're going to be talking about accessibility,   discoverability, and functionality: an audit of  and recommendations for digital language archives. Thanks Claire. Okay, so we're going to start  with some background about language archives   so here's what we're going to talk about today:  for our study we did a review of digital language   archives, particularly for endangered and  under-resourced languages. So as we all know a   large portion of the world's languages are in some  state of loss which makes language documentation   and the accessibility of the materials that are  being documented and preserved very crucial for   future use and reclamation purposes. Digital  language documentation is now the standard but   they are very heterogeneous across archives, and archives vary in standards and quality.  So what is an archive? For our purposes we define  an archive to be a repository of language data   with the aim of preserving and disseminating  those materials, so from Austin (2021):   “an archive should appraise materials; preserve  them long term; make their existence discoverable;   and facilitate their appropriate distribution.”   Digital language archives have many pros and cons,  and if you've ever used a digital language archive  you're probably familiar with these. So a pro would   be that they can be more compact than analog  materials they allow for the digital manipulation   of data — it allows for wider accessibility of  data through using the internet and theoretically   you can copy digital materials an infinite  number of times without altering the original.   However, some cons are that electronic storage  mediums have poor longevity at times and they can   still be vulnerable to environmental conditions and also more digital things like   data corruption server issues and internet  outages. Additionally the long-term readability   is varying and as technology changes we need  to make sure that digital language archives   can can still be used even if technology is  changing. So the problem we're looking at is  digital language corpora are very heterogeneous.  A lot of the standards of surrounding digital   language archives were formalized a long  time ago, and it was before much of the recent   digital work was possible. Additionally, much of  linguistic training focuses on producing   material for archives, such as gathering language  materials through fieldwork or something rather   than working on materials from archives, where  you're using archives as a starting point for   some type of linguistic work. Additionally, some  collections have fallen out of maintenance. You   know websites are not actively maintained at  times. So now we're going to move on to the review. So to review the audit that we performed, we surveyed roughly 50 archives from the   perspective of an end user, rather than  this perspective of a depositor who would   deposit those materials in the archive. We focus  primarily on three criteria, namely, (1) accessibility   and findability, which includes what languages a  user needs to know to access an archive and to   navigate its collections; (2) discoverability  which refers to the availability of the   material within the collections and how users  can navigate those archives and find, and finally   (3) functionality which refers to how usable  the materials are within those archives and   what functions that the site and archive  do or do not promote, that is, usability. There are a few caveats to our survey, primarily  archives do not receive the funding they need   and a lot of this work is unpaid and very much  a thankless job. We do not wish to discredit the   value of this work — it's very important and it does  need to be done. At the same time we do believe   that a collection that is incomplete or imperfect  is better than having no archive at all, so we   do not wish to discourage archiving at all from  the standards we set up for this recommendation.   Finally, we need to clarify that our opinions  and our positionality is from the endpoint of   academics not necessarily the communities  and language groups that might be using   these materials for purposes of cultural  reclamation or language education, so that   needs to be acknowledged. At the same time, we  do believe that our perspective offers some   valuable insights into the nature of language  archives, and points where they could improve. We’re not naming archives directly, keeping  this in mind we're not trying to name and   shame individual archives these perspectives  and these issues are happening across the board   and pretty much everyone can be doing better. So with this in mind we're focusing on issues   that happen on a large scale. Furthermore, we  really want to raise these points so that future   archivists can raise the standards  of what their work is going to be,   and this in turn will help make sure  the field is tenable for the future. So for our results firstly we'll be  focusing on accessibility results.   Accounts and registration is the first tier  of accessibility of any archive or any website.   Some archives are open access, some require  a free account, registration at most, and some   require specific permission to access individual  collections. Many archives combine these three   modes of access. This is not an issue in that it  often is used to respect the wishes of individual   language communities and researchers but it can  be clumsily executed in ways that unnecessarily   impede the research of academics and language  communities. For example, if for some archives   account registration is built into Google Forms  and meaning that lost and forgotten passwords   can't be retrieved, so if you make an account and  lose your password you have to make an entirely   new one to access the archive. Furthermore, the time  elapsed between account registration and actual   access of the archive can be a matter of hours  or sometimes days. Not all archives streamline the   process to request permission to access individual  collections in one archive they require that   you mail in a physical copy of the request form  which is not a viable option for every researcher   and some archives were in some collections  and those archives were only available   through institutional access meaning you  needed an institutional email or a password   while this can provide access to academics we have  to wonder what language communities who are not   affiliated with academic institutions are losing  out in terms of access to these cultural materials   secondly interface language is a very  important aspect of accessibility   of the archives we suffer we surveyed only one  third offered more than one language interface   and it was particularly striking that of the  available languages they tended to be in very well   resourced European languages namely English French  and Spanish plugging sites into Google Translate   was not fully an option either because languages  languages translated through Google Translate   do not necessarily fully translate the  entire site sometimes translations were   inaccurate especially when languages were very  different to european languages structurally   and these translations often removed  functionality meaning that some   links became unclickable once  they were plugged into filter furthermore this emphasizes that the majority  of languages and digital materials available   to Indigenous users are in English even though  English isn't the primary language for many and   this further exacerbates existing divides in  Indigenous communities where access to English   language knowledge remains a critical issue  we point to PARADISEC as a great example of an   archive that makes its content available not only  in multiple languages but in languages that are   very relevant to a specific region for example  Tok Pisin given that it focuses on Australian   and Oceanic languages but other languages  just did not have viable translations to   language translation filters for example in this  translation the name James Woodward is translated   literally which makes it lose its connection to  the tag in the archive so not only are we getting   poor translations but we're losing functionality  of the archive this issue is repeated again where   we see that buttons and links no longer become  are no longer clickable once they're translated   in this cure gives translation example in some  other instances only parts of the website were   translated so we see the elicitation is translated  into Kyrgyz once and then not in the other disability accommodations are another critical  aspect of accessibility for archives and indeed   any other website it is essential that archives  are designed in a way that is accessible to users   who rely on assistive technology to navigate  the internet because we are not experts or   and we are not experienced in using screen  readers to navigate the internet we felt   that we were not able to properly survey archives  for their compatibility with screen readers this   wouldn't have been an honest reflection of our  understanding of it but nevertheless we strongly   recommend that principles of accessible web design  which have long since been standardized should   be followed most archives were friendly to users  with color deficiencies but if you did suffer from   low contrast between the font and background color  which can be an issue for users who are colorblind   or who have other visual impairments these  issues are pretty easily rectified in terms of   the overall scope of web design issues so  we strongly recommend that they be followed   overall our recommendations for accessibility  include broadly streamline the process for account   registration and permission requests but do not  remove these impediments necessarily to implement   a wider disarray of display languages when  possible especially those that are relevant to   the language communities hosted in these archives  finally to follow principles of accessible web   design these are baseline standards that will  improve accessibility of archives across the board our next category is discoverability one  issue we saw with discoverability was in   search functions and mislabeling users often  can't see information about a collection before   accessing it and opening it which makes browsing  more cumbersome collections also frequently do   not outwardly indicate their size or the types of  files they store again you have to click into the   into the collection to find out more  about it also the ability to perform   searches within collections was absent for  most archives just searching for collections   searches were often hampered by missing  metadata incorrect tags and case sensitivity   it allowed some collections to disappear in  certain searches even if they were relevant   also despite its importance for researchers  only five archives allowed users to filter   searches by media and file type which would  hamper those who are looking for specific   types of files and sometimes eaf files  were mislabeled in browsers xml files this is an example of an archive that did a good  job of allowing for that sort of filtering and   search this is the Alaska Native Language Archive  and the picture shows that it allows the user to   filter their search by whether a collection  contains text files audio files or both we also saw some problems in metadata how they  were recorded there's little consistency between   archives and how much or what kind of metadata  is offered as well as how it's displayed   even within archives collections vary  wildly in the sort of metadata they provided   and that makes navigation difficult because  meta the metadata also affects the the searching   for collections so can also allow certain  collections to disappear in a targeted search   metadata is also sometimes layered so there would  be one set of metadata for a whole collection   then a different set of metadata for  individual folders within the collection   and then sometimes even a whole different  set of metadata just for individual files we also saw some issues with site maintenance  some sites required Yale VPN access for us   researching through Yale some sites still require  Adobe Flash Player which was discontinued as of   December 2020 and blocked as of January 2021  so accessing these sites is now impossible   also server loss a movement can cause  data loss especially if a website is not   maintained properly and the  Wayback Machine does not   capture sufficient snapshots to fully restore  lost archives and does not maintain links many archives also suffered from pervasive broken  links which makes navigation extremely difficult okay so we offer these recommendations  we recommend the archives offer   more clear and detailed descriptions of  collection content especially available   before clicking into the collection we recommend  that archives allow users to search by file type   and they ensure the correct labeling of  file type our next category is functionality   in this category we mainly saw issues  in site content structure and downloads   most critically most archived that we looked at 34  out of the 41 offered no option to bulk download   collections only one file at a time this makes it  extremely difficult and time consuming to download   large collections like some that have over 15,000  files and even smaller collections could be   difficult to download in their entirety also file  downloads cause a loss of nested file structure   which again leads to more time spent trying to  maintain format for large collections especially   those that with imprecise file names this makes  it especially hard to find matched content such as   an audio track that has a separate  transcription file and separate translation file okay so for this category we recommend  that archives offer bulk download   options and also specifically offer the option  shopping cart of downloads so that users can   download specific files that they need all at  once versus just downloading an entire collection okay so moving on to our conclusions again  to reiterate we as academics do not represent   the entirety of all archive users and these  results do not encompass the full breadth of   what needs to be addressed especially for  community members using language archives   as starting points for language reclamation and  pedagogy and we also reiterate that again archives   often depend on volunteer labor and archives  are often very underfunded and it usually takes   a lot more funds and labor to have this type of  long-term maintenance but we can still take steps   to ensure that these language materials that  often rely on archives to be preserved we   can take these steps to make sure that they are  well kept and safe and accessible for future use   and so just to go over our overall recommendations  again in terms of accessibility looking at   accessible web design streamlining the process  for accessing permission and registration   and expanding interface languages of the archive  sites and then in terms of discoverability   having the ability to search by file type  having more clearer and more consistent metadata   and also making sure that file types are correctly  labeled and then in terms of functionality   having the option to bulk download or having that  shopping cart download option will help streamline   the process of using archive materials so going  forward we want to look at i guess in terms of   in linguistic circles there's the idea of corpus  linguistics and that uses different corpora like   COHA or COCA and we used corpora that were  more describing under-resourced and endangered   languages so a comparison between the corpora and  materials between these types of corpora would be   important and also we would like to look into  what documentation archives have talking about   their long-term backup plans so do they do they  have some sort of system in case their website is   taken down or in case links are broken and stuff  because because this material is so important and   because it's stored in these archives we  can make sure that the materials are also   backed up in a long-term and secure way and  then we'd also like to look at mobile access of   archives because in a lot of communities phones  are the primary means of accessing the internet   and so if these archives are available  on the internet we need to make sure that   tablet and phone access for these archives  are also very usable and we would also lastly   like to look at more experiences from  end users especially community members   who use language archives for language reclamation  and pedagogy so thank you these are our references