The data for this project was derived from the National Library of Medicine’s “The Public Health Film Goes to War” collection, a series of eighteen health educational films, the majority of which are geared towards members of the armed forces, and all of which were created and distributed between 1940-1945. 

Knowing that my primary source was created and distributed during a wartime climate and catered mostly towards a specific audience (members of the armed forces) this dictated expectation of what kinds of social trends I thought I would discover within the material. My perceptions about the social trends and mores of the time made the lens through which I assessed these films immediately critical. 

What technology did you use and why?

The technology aspect of my project relied heavily upon the usage of textual analysis tools. I utilized the following: 

samediff tool on

The samediff tool is designed to visualize differences between text by creating a visualization comparing words that appeared in both texts and words unique to each individual script. I used this to create a more concrete visual representation as to how conversation topics differed between videos created for men and women. When I noticed the trend in command phrases in the video created for a female audience, I sought out to compare words found solely in the “strictly personal” video against words found in  other videos from the archive that were targeted to men that also concerning maintenance of health and hygiene (specifically hygiene, food, and personal cleanliness) . Among others, words such as “proper” “calories”, “don’t”, “weight” and  “exercise” appeared exclusively in the video pertaining to women’s health and none of the videos catered towards men in the armed forces. 

ShelfWatch named entity recognition

ShelfWatch was used to identify unique names, places, proper nouns with the goal of identifying what and where foreign countries were mentioned within the corpus. After names were collected and recorded on a spreadsheet, the terms were used to search within Voyant Tools for collocates and other concordant words. This tool helped showcase instances of racial and ethnic othering and compare how developing and colonized nations were treated against European ones. 

VoyantTools was used to search for specific terms of interest amongst the entire corpus and identify concordant words and phrases in order to quickly and efficiently contextualize word usage. To better identify phrases that may indicate health messaging techniques, words typically involved in phrases of action/consequence or action/result or that indicated sequence, comparison and risk were searched.

terms searched to assess framing techniques, compare command language: risk, die, healthy, should, not, shouldn’t, shall, can, may, ought, do, don’t, won’t, will, live ,die, save, can ,result, consequence, avoid, likely, more, worse, less, else, dangerous, prevent, must, have, has, never, survive

*stems for each term were also assessed i.e. both “survive” and “survival” were included within the search. Terms that are bolded yielded phrases of interest.

Data Management

For the data derived from ShelfWatch, unique words depicting places, nationalities, and ethnicities were identified in each corpus and tracked on a spreadsheet where they were attributed to the specific script of origin. These words were entered into Voyant Tools, and using the collocates function, were assessed for the context in which they were utilized and phrases of interest were inputted into a spreadsheet. 

Words of interest that were searched for within the entire corpus with Voyant Tools and were inputted into a spreadsheet where their occurrence count corresponded with each video, occurrences that included phrases of interest were inputted into the spreadsheet. 

In terms of data organization, transcripts for each film within the database were downloaded as a .txt file and stored within their own individual folder so that the corpus could be easily uploaded to VoyantTools for analysis and individual texts could be analyzed on ShelfWatch and dataBasic. The corpus and spreadsheets were stored within a larger project folder. Many words that I expected to yield insights and that were searched for in the corpus through Voyant did not yield insights that I deemed to be noteworthy for the project. The specific pages regarding each theme depict which search terms “made the cut” and which didn’t. My decision was guided based upon if words indicated phrases that fit within the themes I sought/observations I noted. If they did not fit within the theme, the phrases they were embedded within did not show. When deciding which words to utilize specifically that indicated if commands and positive or negative framing were present, I searched phrasing from public health campaigns and research studies to find relevant language that would likely lead to finding command phrases and phrases depicting health risk. This approach may be limited by the breadth of words that I have chosen.


excerpt of concordance data

Leave a Reply

Your email address will not be published. Required fields are marked *