Thus far, we have been using HTML tags primarily as a means for arranging bits of text into a hierarchical structure which expressed the documentary hierarchy of an interlinear text. The final outcome of the previous chapter — the rendered extract — had many advantages compared, for instance, to interlinears created in a word processing application. That presentation enables content to be styled in terms of categories of content; its content can reflow adaptively into different screen sizes (such as a mobile phone or tablet) without corrupting the structural hierarchy; and CSS may be used to drastically alter presentation of the same interlinear content. But despite the flexibility that the browser environment imparts to the rendered interlinear text, it nonetheless remains a static document as far as the content itself is concerned. Given that our aim is to create an editor which enables linguists to efficiently edit interlinear content, we need to move beyond the basic capabilities of markup and styling into the realm of interactivity.

In this chapter, we will consider a handful of new HTML tags, including: <video>, <audio>, <button>, and <input>. As HTML markup, these tags are not different in essence from those of the preceding chapters. But they are particularly associated with interactive content. The user plays video or audio, clicks a button, and types in an input. As we shall see, we can use the browser’s built-in programming language — Javascript — to program the behavior of such elements (indeed, of any element) in useful ways. Thus, in this chapter we will expand our use of the browser development

Interactive elements

Consider the HTML video tag below, and the video itself following:

<video controls src="school_memories.webm" type="video/webm"></video>
School Memories! by “Ilonggo Boy” with a controls HTML attribute

There are three HTML attributes present: src indicates the name of the video file which the embedded video should present (in this case, a file named school_memories.webm). The type is specifying the format of the video itself. there are some complexities involved here regarding differences between browsers and the formats they support, but suffice it to say that in some situations the format specified by the string video/webm is supported. (Note that on some browsers, you may need to hover the cursor over the video in order to see the video controls.) Finally, and most importantly for our purposes here, is the controls attribute. The presence of this attribute (it does not need to specify a value) indicates that the video should be rendered with the standard user interface buttons and other widgets to control playback of the video.

The use of this interface is obvious to anyone familiar with the web platform. This video may be played by clicking the play button, paused by clicking the pause button, and navigated over its duration by clicking the timeline. The volume can be adjusted by with a similar dragging interface, and finally, there is a “full screen” button which will expand the video to take up the device’s entire screen.1 Clicking any of those buttons will result in “behavior” which is predictable to anyone accustomed to using such interfaces: “play” starts the video, dragging the cursor on the timeline resets the “current time” of the video, adjusting the volume meter changes the audio intensity, and so forth.

One might assume that such controls would be required in all <video> elements, but this is not the case. A <video> tag will also “work” without the controls HTML attribute present. The only difference in the video presented below is the lack of that attribute. Note that not only is there no user interface in the video, clicking the video does nothing. It seems completely inert: try clicking the video below.2

<video src="school_memories.mp4" type="video/mp4"></video>

Compare the HTML markup above — without a controls attribute — to the rendered video below. Note that if you move your mouse over the video, no controls appear.

School Memories! by “Ilonggo Boy” without a controls HTML attribute

This raises an immediate question: what good is a video interface doesn’t seem to be controllable by the user? This turns out to be a very useful question, because the answer to it helps to foreground some of the most important aspects of the browser programming environment. In order to understand the utility of a video without controls, we must understand (if only in a general way) what the browser actually does when it loads an HTML file.

  1. explain a particular DOM object: <video>
  2. explain what the DOM is
  3. explain how to interact with the DOM via the inspector and console
  4. show how to use javascript to extract data from the DOM
  5. explain variables
  6. explain objects generically
  7. create a sentence object from the DOM
  8. explain arrays
  9. put the new sentence object into an array

The browser

When the browser loads an HTML file, it reads each of the tags in sequence and constructs an object of that type. For instance, if the HTML file contains the markup <video src="somevideo.mp4"></video>, it will create a “video object” which represents that element. It is the object, not the sequence of characters that constitute the tag, that is actually programmable. It is important to note that a video object does require any understanding of how such a video object “works” internally. Somewhere deep in the code of the web browser, someone has written computer code that does understand the complex details of just what it takes to decode a video file, render the imagery it contains, keep its audio in sync, and much more. But there is a key concept in programming: objects should have an programming interface that requires understanding only of those attributes of an object which are of direct use to a programmer.

In the case of a video, it only requires familiarity with web-based video to make a fairly accurate prediction of the functionality that a video object should have. Mostly obviously, one should be able to play it! Other necessariy functions would include stopping the video, adjusting the volume, and “jumping” to some other point in time within the video’s full duration. And indeed, these are exactly the attributes )(among some others) that the video object a browser will impart to any video object it creates.

Interacting with the Document Object Model in the Developer Tools Inspector

In this section we will see learn how to look behind the scenes of a web page by using the developer tools, which will allow us to work with the object model of an HTML document. We will see that the programming interface of a video object (the kinds of things one may program a video to do), while specific to a video, is not essentially different from the interfaces of any of the other kinds of objects that the browser constructs when reading a page. Indeed, the presence of any of the HTML tags which were used in the previous chapter (div, article, p, and so forth) will cause the browser to construct an object corresponding to that tag. When the browser loads an HTML document, it creates an object corresponding to every tag in the page. Moreover, the browser keeps track of the structure of interrelationships between the elements in the HTML document: in a rare case of an easily understood and useful technical acronym, this structured collection of objects is referred to as the Document Object Model — the DOM. It is the DOM that the programmer actually interacts with.

Because working with the DOM is so important, modern browsers provide a

Here, the upper half of the browser is showing the web page itself (as per normal use), and the lower half of the browser is showing the developer tools. Note that the developer tools are not designed to be used by “end users” of the browser, but rather, are aimed at helping “web developers” — people who are creating web content — to experiment with the content of their pages interactively. Unfortunately, the details of how to make use developer tools are not standardized from browser to browser, so while they all support the same kind of operations, the layout of the developer tools can vary from browser-to-browser. For the sake of simplicity, examples here will be taken from the Firefox web browser, but more information about the developer tools in various browsers is linked in the table below.

Developer tools in various browsers
Browser OS Key combination Menus More information
Firefox Mac +Option +I Tools > Web Developer > Toggle Tools
Windows, Linux Ctrl+Shift+I
Chrome Mac +Option +J View > Developer > Developer Tools
Windows, Linux Ctrl+Shift+J
Safari Mac +Option +I First, Safari > Preferences > Advanced, check “Show Develop in menu bar”. Then, Develop > Show JavaScript Console
Edge Windows +Option +I

As indicated above, particular keystroke sequences or dropdown menu items may be used to open the developer tools on various browsers across various operating systems. But let us use a more direct method. In all browsers, it is possible to “right click” (command-click on the Apple operating system), at which point a contextual menu will appear.

A screenshot of a contextual menu in Firefox, showing the ‘Inspect Element’ option
A screenshot of a contextual menu in Firefox, showing the ‘Inspect Element’ option

One of the options in this menu is labeled “Inspect Element”, or in other browsers, “Inspect”. Selecting that option will automatically open the developer tools. Here I have placed the mouse over the word estudyante in the sample text from the end of chapter 3.

Inspecting a paragraph tag in the inspector by right-clicking in the web page.
Inspecting a paragraph tag in the inspector by right-clicking in the web page.

The developer tools is a tab panel interface, with the top bar listing the available panels by name. The developer tools are highly configurable, and your defaults may differ somewhat from those shown in the screenshot. For instance, the list of tabs — here showing Inspector, Console, Debugger, and Style Editor — is configurable. (Selecting the ellipsis button at the right end of the top bar, and then the Settings option will reveal a checklist which includes “Default Developer Tools”.) We will mostly be making use of the Inspector, Console, and Style Editor tabs. Note that there are also options to set whether the developer tools are “docked” to the bottom, right, left, or in a separate window. Both the “bottom” and “right” settings will appear in screenshots below.

Note that the “Inspect Element” command above opens the Developer Tools with the Inspector pane revealed.3 The “raw” HTML is rendered in the Inspector pane, but there more is going on than is at first apparent. The content of the Inspector pane is not just the HTML source of the page: it is a sort of visualization of the browser’s own computational model of the content and the state of the web page. Note that in the screenshot, a particular HTML tag, the paragraph tag containing the Hiligaynon form estudyante, has been selected. In the inspector, the opening tag of the element (<p class=form>) is shown with a blue background. The indentation of the tag within the inspector’s rendering of the HTML source code is indented to show the level of nesting of each tag. This nesting is also expressed by the way that any tag which contains other tags is prefixed by a gray triangle: clicking that triangle will toggle whether that tag’s “child” tags are visible in the inspector.

The reader will soon notice that moving the cursor around the Inspector’s HTML content will interactively highlight the corresponding rendered element in the web page itself. In fact, many aspects of the content of the web page can actually be edited directly through this inspector pane. For instance,

The Firefox developer tools opened as the bottom pane of the browser.

Independence of styling from DOM

A crucial feature of CSS is that it is substantially independent from the Document Object Model. This can be made clear by comparing the selection of the very same word (estudyante) from the heavily re-styled version of the Hiligaynon text extract from the end of Chapter 3:

Inspecting a paragraph tag in the inspector on the heavily restyled version of the text extract.
Inspecting the same paragraph tag in the inspector in the heavily restyled version of the text extract.

Controlling the video with Javascript

Javascript is a programming language. Unless the browser user intentionally disables Javascript, it is always running whenever a web page is loaded by the browser. The strong link between Javascript and the web browser distinguish it from other programming languages such as Python or R. Javascript and the web have always been in a symbiotic relationship, although recently the power and robustness of the Javascript side of that symbiosis has been gaining importance.

Controlling the tokenizer

Here is sample markup for two new HTML tags, input and button:

  <input placeholder="transcription" class="transcription-input"/>
  <button class="tokenize-button">tokenize</button>

When rendered, that code appears as below:

A text input and a “tokenize” button

But if we type in the box, or click the button, nothing happens. Although the default presentation of these elements afford familiar interactions (the text input is for typing and the button is for clicking), HTML itself doesn’t prescribe what the result of those actions should be.

A text input and a “tokenize” button
  1. Select a node
  2. Then:
    1. Get or set the value of an attribute directly
    2. Move the node elsewhere in the DOM
    3. Hook up an “event listener” which can call other functions in response to user action
    4. Destroy the node

begin to fix

But my people don’t care about that. I have to convince them that procedures are useful at all. I’m fighting the “I can just put it in Excel!” standpoint.

So I’m arguing for a top-down “curriculum”. Well, not really a curriculum, I guess a vade mecum is the right term. I’m trying to show people that look, there is this thing called

What does it mean to “implement” a data structure?

In a top-down approach, you start by saying “Look at this batteries-included abstraction called a video. You can make it do things which are in the nature of a video, like play, pause, change time, speed up… by telling it to do those things. You can even tell it to destroy itself.” Whatever. The idea is that you’re telling people about the thing, and the fact that the thing must have a name of some sort seems kind of obvious.

From there you say, there are aother built-in objects with their own set of “commands”. In fact, every tag in an HTML page becomes a thing with commands. From there:

document - another built-in

document.querySelector - like CSS selectors, gets you a reference to a node object

DOM - now we’re in a position to understand the tree

but if we do:


we will probably want to do more than one operation on that object. So, we save a reference to it. That’s a variable.

let myVideo = document.querySelector(‘video’)

And then


myVideo.currentTime = 10

And so forth, which is of course better than



document.querySelector(‘video’).currentTime = 10


Finally, the kicker: this video thing is an object. It happens to be an object which is a node in the DOM, but Javascript can also create objects outside of the DOM. Then:

let word = { “form”: “casa”, “gloss”: “house” }

And explain that up to:

let sentence = {

“transcription” :“mi casa su casa”,

“translation”: “My home is your home”,

“words”: [

{ “form”: “mi”, “gloss”: “my” },

{ “form”: “casa”, “gloss”: “house” }…



Now we can get strings out of the DOM by using s, which gets us a “transcription” string. Same for translation.

let transcription = document.querySelector(‘input.transcription’).value

To get forms and glosses, we will need a function:

let tokenize = transcription => transcription.split(" ")

let forms = tokenize(transcription)

Now we create an input for each gloss, that requires a .forEach and document.createElement(‘input’).

Finally we need a method to read the whole structure back from the DOM, and in the end we can create that sentence structure above.

I feel like this is a path that the documentary linguist might be willing to follow. Because the end of htis road is a simple but functional application: a sentence editor. In the process of working through that, most of the key ideas necessary to make a (barely) usable text editor are established.

At any one of these steps, there might be a need to go off and explain very basic htings like variables, assignment, data types, etc, etc.

When I look back on my own trajectory into programming, I can remember many many points where I found myself thinking “how is this going to help me do linguistics”, because that was all I ever wanted to do. To me, and I think, to many other potential documentary linguistics programmers, the question was always, “where is the stuff I care about?”. So if I can give them the stuff I care about out of the gate, I am hopeful that people will want to keep going.

end to fix

Handling user input

Selecting inputs

Using CSS selectors to select elements with Javascript

{ “title”: “Engineering Language Documentation”, “author”: “Patrick Hall”, “date”: “2019-06-14” }

  1. There is some variation in the default behavior of embedded video from browser to browser and operating system to operating system — for instance, on iPhones, videos are always played in fullscreen.

  2. Most browsers do allow the user to toggle the presence of default controls on any video, even this one, by “right-clicking” and selecting “Show controls”. Let’s ignore that route for the sake of discussion.

  3. The relative sizes of the sub-panes within each view may be resized by clicking and dragging on the borders of each sub pane. I have minimized the lower pane in the screenshot.