Reflections on Data

I study environmental history, a subfield of history that I think requires frequent out of the box thinking about data. Data as Owens argues, “are a species of artifact.” As “manufactured” entities, Owen points out that the idea of “raw data” is a little “misleading.” For humanists, the idea of raw data itself needs some untangling, mostly in the ways in which we conceive of data. As curated and created artifacts, data are inherently interpretable and thus can be read as texts.  This insight was especially important for me in my own research. I broadly chronicle the origins, motivations, and effects of dredging of the Detroit River from the end of the Civil War to the beginning of the Great Depression to offer a new analytical lens by thinking about dredging as a historical, material, andecological process. Because the dredged channel straddles the international border (for the most part), I argue that dredging is as much a territorializing process as it is a technological and political process. This is not a story about the rise and fall of dredging for dredging continues as a maintenance activity, worth multimillion contracts, every year; rather, it is a story of how nature was constructed, understood, and altered for returns. Between 1865 and 1930, the Detroit River went from a transportation chokepoint to an efficient and managed waterway with the creation of cross border and along the border infrastructure. Thus, in my case, ‘data’ are artifacts that are disparate at best. For instance, one of my biggest data sets is knowing how much material was dredged during the time period that I am interested in, when and where from. The location of the dredging will reveal the relative importance of that site. But dredging data is only one aspect of my project. As much as I did not enter environmental history to study people (more on that in a bit), the most important actors for my project include engineers, fisherfolk, and politicians. conversations about the border send dredging are carried between the U.S Army Corps and the government of Canada. It is a fascinating exchange. But the most important and challenging source of data for me has been the non-human. How do fish speak? My efforts at bringing in the non-human have begun with my broadening my definition of what I consider data. Michigan and the Detroit River in particular were an important testing ground for fish hatcheries and nurseries, especially in the nineteenth and early twentieth centuries. Access to fish hatchery data is one part of it but making fish speak, I think would (unfortunately) involve having humans speak. I recently found a series of testimonies of Canadian fishermen talking about their experiences of fishing in the Detroit River and the effects of dredging on their catch. I would not have thought of testimonies as data, but they are. They point to specific locations, times of the year, as well as amount and kind of fish caught. Visualizing such data along with dredging data could possibly show where it is that fish spawning sites were most severely affected. Rethinking what and how I think of data has helped me rethink how I process it. for the longest time, I thought of data as data points, and data sets. But it was in design school, with an emphasis on translating text into visual communication that I began rethinking how it is that I would process and communicate information. Yet, an important aspect of communicating information in interesting and visually attractive ways is being cognizant of the curation that goes into the data selection itself. Whilst data collection should be an ethical process, it is important to acknowledge and understand how it is that data is curated at the archives and whilst representing it. Again, Owens’ insights on this are extremely useful. As humanists we need to be cognizant and responsive to the elisions in the archives and the data sets, we gather. 

With increasing and better technologies of OCR, it has become markedly easier to find information. For instance, the testimonies I mentioned earlier, I came across them when I was looking for something else on the Haathi Trust, and they showed up because of OCR. Thus in browsing for one set of archival materials, I came across something else. In allowing us greater access through more reliable databases that are searchable, OCR has opened up a whole new world. Yet,  Milligan points out, the “the browsing model can lend itself to useful contextualization of a research project, learning about topics seemingly unrelated to your specific queries, getting a sense even as images skim across your screen of the zeitgeist of the source or time, and gaining a comprehensive rather than focused survey of the past.”[1]Furthermore, understand which newspaper’s materials become accessible has an effect on what is studied. Thus, as Milligan points out, through a survey of Canadian dissertations after the opening of materials from the Toronto Star, Globe, and Starwere digitized and made accessible, there has been an effect on the total number of citations; declining use of non-digitized newspapers; and an increased attention on Toronto as opposed to other places.[2]As someone who has used (and continues to) use digital newspaper archives through ProQuest and the INK-ODW Newspaper Collection, these insights particularly spoke to me. During my own research, I have become aware of the limitations of the digital collections. While they are an excellent starting point, they represent a skewed data set. For instance, it was only in the Library and Archives of Canada, that I found a great article on American territorial ambitions on Canada from 1893, arguably a time when the two countries were seen as friendly neighbors and common knowledge tells us that territorial disputes and questions were well on their way to being dealt with. The article, a long form single-column piece is not available online. But in the file, it was an important piece of evidence used to justify Canadian defenses. This discovery altered my thinking about the time period in question and has pushed me to look at newspaper archives from other non-digitized sources and publications.

The readings this week have only reinforced my desire to look beyond historical archives to find new spaces and places where stories about dredging might exist. Furthermore, they reemphasize the need to be cognizant of the inherent biases in the ways in which data collections are created, curated, and communicated. At the end of the data and archives are manufactured. Being aware and responsive to that reality will push me to question how it is that I gather data, as well as process it. 

[1]Ian Milligan. “Illusionary Order: Online Databases, Optical Character Recognition, and Canadian History, 1997–2010.” The Canadian Historical Review 94, no. 4 (2013): 540-569. (accessed January 23, 2019).


Published By

Leave a Comment

Your email address will not be published. Required fields are marked *