On Distant Viewing
Ten years after Franco Moretti’s Distant Reading,[1] Taylor Arnold and Lauren Tilton released their book Distant Viewing (MIT Press, 2023), where they explore the ‘methodological and epistemological implications of using computer vision as a tool for the study of visual messages’. (p. 11) When Moretti used his concept of ‘distant reading’, although the study of text through computational methods was not new, he was the first to challenge established hermeneutical approaches and propose a computer-driven solution for the study of world literature. Back then he prophetically wrote that a ‘new “science” emerges where a new problem is pursued by a new method’.[2] This form of quantitative approach for interpretation, as Ted Underwood[3] has claimed, can be traced back to nineteenth century experiments and later works such as the one by Janice Radway.[4] Moretti’s groundbreaking contribution was to combine computational methods with ‘looking beyond the canon’, and therefore to oppose the convention of close reading with a ‘little pact with the devil’, to learn how not to read text, but to approach it from a distance in order ‘to focus on units that are much smaller or much larger than the text: devices, themes, tropes – or genres and systems’.[5]
Distant Viewing extends this mantra to an exploration that, for film and media studies, may sound like a similar hell; nevertheless, this research by Arnold and Tilton positions itself within a context where film and media studies and digital humanities have already blended, even if just recently. This work is the outcome of extensive research they started in the 2010s, making use of computational analysis applied to visual media. Therefore, although computational methods may not be our pact with the devil anymore, what Moretti was saying, that if we study something in its entirety we might lose something, may still be true. What is it that we lose, and what might we gain?
The distant approach deals with humans’ limits of looking at large corpora. If for literature studies text mining could approach thousands of books, the task of viewing large visual media corpora is constrained also by technological means. Arnold and Tilton present a compelling argument for the adoption of computer vision techniques in the interpretation and analysis of large collections of digital images, opening new avenues for interdisciplinary inquiry. Three main case studies support Arnold and Tilton’s inquiry, treated in three separate chapters, each presenting an iterative approach that integrates together earlier tools, approaches, and new revisions that follow the most recent developments in computer vision.
The question that the authors bring to the table is not if visual culture can be approached through computational methods, but rather, following McLuhan’s findings,[6] what it means to study it by these methods. What does it mean to ‘view’ from a distance, and what are the relevant and helpful strategies to make this process fruitful? They examine these theoretical underpinnings of Distant Viewing in the early chapters, where they grapple with questions of interpretation, methodology, and the epistemological implications of computational analysis.
Arnold and Tilton have the privileged position of working within a recent field of interdisciplinary work and can bring their discipline further by questioning not only the established methodologies but also by taking a critical stand on computer vision itself. As the authors make clear, at the bottom of the process of distant viewing there is often computer vision software that has already been developed for other means. An example they cite is the systematic use of computer vision methods in algorithms for face detection and object-tracking. As they argue, using such methodologies is never neutral as they have been developed to support other needs, sometimes for military and surveillance purposes. Sometimes these technologies are trained on different data: an example being the use of computer vision for the exploration of an archive of photos in black and white; here the original algorithm was trained on data with digital and modern photos, and thus was not exactly adapted to view that particular corpus.
Moreover, computational processes are also shaped by a set of social and cultural practices of looking and ways of seeing.[7] Following Stuart Hall, images have culturally coded elements that require ‘decoding’. In other words, visual and other messages are interpreted differently depending on the cultural background of a person. Images, and their interpretation through algorithms, are thus subjected to cultural norms too. Famous examples are the use of standards for image processing: the so-called ‘Lenna’, originally an image from Playboy magazine used in image processing; or ‘Shirley’, a white girl working for Kodak whose skin tone was used as a way to calibrate all of Kodak’s film up to the 1970s.[8] Such examples make evident how standards developed in a racist and sexist environment might affect how technology for image interpretation works. In Distant Viewing, these relationships are acknowledged beforehand, allowing the authors to highlight the possibilities and limits of viewing from a distance.
Therefore, because the process of making meaning is key to interpreting results by computer vision, Arnold and Tilton argue that this can be mediated by structuring annotations, thus helping computer vision algorithms see what we want them to see. This process of annotation, as the authors stress, is at the center of Distant Viewing and must not remain rigid, but be an iterative process that models such new structures to capture diverse layers of meaning. Distant Viewing proposes to adapt current pipelines for data extraction and analysis and offers a methodology for those who wish to engage with it. This approach is structured in four steps: annotate, organise, explore, and communicate. Sub-chapters are organised following those phases. The steps of annotation and organisation are fundamental to adapting automated annotation to research goals and material or aggregating those results with metadata. The phase of exploration serves to develop hypotheses and reiterate the process of annotation and organisation once again if needed.
Annotation can be done in different ways. For example, the analysis of color in film has been explored already in other digital projects, showing for instance the use of different technologies for coloring film strips throughout film history.[9]In Distant Viewing, the authors chose instead a more limited corpus. Focusing on the 100 highest-grossing films, they identify the characteristics of color they want to look at. In this experiment, they show first how to approximate human eye vision by converting the RGB values into HCI values, thus creating useful annotations. Because some patterns are impossible to detect on such a large corpus by the human eye, computer vision can help distinguish specific relationships between colors and make connections between values for an image such as hue, chroma, and intensity. Once annotated properly, results can be computed by performing statistical analysis to determine, for instance, the presence of more or less white or black in posters and to find a pattern of change over time, which a transformation in printing techniques can explain. Computer vision makes clear how genres are related to specific color relationships, pairs, and palettes that do not always reflect what are the perceived cultural norms, but that mostly fit into conventions, such as the use of cold color combinations for sci-fi films.
The versatility of distant viewing is explored in two further case studies involving a publicly available digital archive of documentary photos by the FSA-OWI collection[10] and a copyrighted corpus of movie images from two US sitcoms (Bewitched and I Dream of Jeannie). The case study of the FSA-OWI collection (accessible on the platform Photogrammar[11]) explains the authors’ methodological principle of organisation; they describe the challenges of the process of aggregating curatorial metadata. Such data were already present in the digitised collection and based on manual tagging and historical categorisations of subjects. These categories were put into dialogue with the annotated metadata and aggregated to reveal particular correlations and thus the exploration of the corpus.
The step of exploration, which follows a specific research question, can be exemplified by the case study of Bewitchedand I Dream of Jeannie. Here they asked if a character was dominant in the series by detecting shot breaks and identifying the presence and the prominence of characters. Arnold and Tilton explore the relationships between characters and how shots may determine key points in the narrative. They find that the two sitcoms are not, contrary to popular convention, terribly similar, but employ quite different gender dynamics. Although this study shows a pioneering approach in the use of facial recognition and shot boundary detection algorithms, perhaps exploring even larger corpora than two sitcoms, if copyrighted material is available for study, will serve the purpose of an exploration that goes beyond the canon, as Franco Moretti wished. Such a case study though, if compared to other examples of corpora that are available publicly, speaks for the importance of making material available publicly. As the authors stress, ‘more work needs to be done exploring the possibilities of working with material that is under copyright’ (p. 223).
This surfaces the question of communication as the final step of the Distant Viewing methodology, also an explicit part of common data science pipelines. The last chapter, dedicated to the publicly available images from the Metropolitan Museum of Art in New York, provides a strong argument for the implementation of distant viewing and the step of communicating results. The authors claim that combining archival metadata and computer vision is fundamental to creating recommendation systems to support and facilitate the exploration of corpora while unveiling unexpected connections. Looking at large collections is for certain the most useful application of computer vision, especially as a way to approach corpora from users with different expertise and interests. This use holds the promise of an expanding field that is being revolutionised by recent developments in deep learning.
Another example of how to communicate results is outlined by questions of accessibility and transparency. Arnold and Tilton adopt inclusive language and provide detailed footnotes to demystify technical concepts, ensuring that readers from various backgrounds can engage with the material. Moreover, the book is committed to openness, with published datasets, code, and additional visualisations that are available under open source licenses on a dedicated website (distantviewing.org/book). This underscores the authors’ dedication to fostering collaborative research practices.
In a recent publication, Franco Moretti has argued, with disappointment, that despite the promise of quantitative and hermeneutic approaches to integrate, unfortunately these two methodologies have not been able to complement each other to produce new theories.[12] Going back to the question of what we lost by taking the road of computational methods, I would hazard some provocative questions. If the purpose of a distant approach was to go beyond the canon, why are we analysing color in a corpus of the highest-grossing films? Why are we analysing documentary images related exclusively to US history? Why are we not considering the gender dynamics in sitcoms with an intersectional perspective? I am convinced that the approach in Distant Viewing is an innovative first step; what it may need is more comparative research, looking at other cinemas and/or departing from alternative perspectives. Recent comparative studies have shown the importance of exploring broader scenarios, e.g. collecting data about historical audiences and comparing it on a European level.[13]
In film studies, even for hermeneutical approaches, it took a long time to look beyond the canon. Studying patterns and tropes and cultural productions as systems has the potential of looking from a distance, thus studying film and media culture at scale. To go beyond the triad of canon, index, and apparatus, we might think about cinema, as De Rosa and Hediger suggest, in terms of ‘configurations’.[14] Arnold and Tilton are conscious that their study is a first step, but they have paved the way for more research on this path, and it is essential not to forget that scholarship needs to be prepared to traverse this terrain with both technical and theoretical abilities. Both competencies are addressed in Distant Viewing.
In conclusion, Arnold and Tilton’s book is a seminal work that advances our understanding of the intersection between visual culture and computational analysis. Through a meticulous blend of theory and practice, the book offers a compelling framework for leveraging computer vision techniques. This book is essential reading for scholars and practitioners alike, poised to shape the future of film and media studies and digital humanities.
Nicole Braida (Johannes Gutenberg-Universität Mainz)
References
Berger, J. Ways of seeing. Penguin on Design 1. London: Penguin. 2008.
De Rosa, M. and Hediger, V. ‘Post-What? Post-When? A Conversation on the “Posts” of Post-Media and Post-Cinema’, Cinéma & Cie, 16.26/27, 2016: 9-20.
Flueckiger, B. ‘A Digital Humanities Approach to Film Colors’, The Moving Image, 17, 2, 2017: 71-94.
Hall, S. ‘Encoding/Decoding’ in Media studies: A reader 3. 2000: 28-38.
Heftberger, A. Digital humanities and film studies: Visualising Dziga Vertov’s work. Quantitative Methods in the Humanities and Social Sciences. Berlin: Springer, 2008.
McLuhan, M. Understanding media: The extensions of man. London: McGraw-Hill Education, 1964.
Moretti, F. ‘Conjectures on World Literature’, New Left Review, no. 1, February 2000: 54-68.
_____. Distant reading. London: Verso, 2013.
_____. ‘The Roads to Rome’, New Left Review, no. 124, August 2020: 125-136.
Pater, R. The politics of design: A (not so) global manual for visual communication. Amsterdam: BIS Publishers, 2016.
Radway, J. Reading the romance: Women, patriarchy, and popular literature. Chapel Hill: University of North Carolina Press, 1991.
Sturken, M. and Cartwright, L. Practices of looking: An introduction to visual culture. New York: Oxford University Press, 2009.
Treveri Gennari, D.,Van der Vijver, L., and Ercole, P. ‘Defining a Typology of Cinemas across 1950s Europe’, Participations, Vol. 18, no. 2, November 2021: 395-418.
Underwood, T. ‘A Genealogy of Distant Reading’, Digital Humanities Quarterly, 011, no. 2, June 2017: https://www.digitalhumanities.org/dhq/vol/11/2/000317/000317.html (accessed on 15 March 2024).
[1] Moretti 2013.
[2] Moretti 2000, p. 55.
[3] Underwood 2017.
[4] Radway 1991.
[5] Moretti 2000, p. 57.
[6] McLuhan 1964.
[7] Hall 2000; Berger 2008; Sturken & Cartwright 2009.
[8] Pater 2016.
[9] Flueckiger 2017; Heftberger 2018.
[10] Farm Security Administration/Office of War Information Black-and-White Negatives by the Library of Congress.
[11] Photogrammar provides a web-based visualisation platform for exploring the 170,000 photographs taken by the FSA and OWI agencies of the US Federal Government between 1935 and 1943.
[12] Moretti 2020.
[13] Treveri Gennari & Van der Vijver & Ercole 2021.
[14] De Rosa & Hediger 2016.