Archiving Europe: Unveiling the visual world through stock shots in French television (2001-2021)
by Shiming Shen
Introduction
The ANR CROBORA[1] (Crossing Borders Archives) project, led by Matteo Treleani at Côte d’Azur University, aims to study the dynamics surrounding the construction of the European Union in television and online media. CROBORA posits that the dissemination of stock shots fosters a common European imaginary, albeit potentially selective, relying on a limited number of visual elements[2]. The project explores the impact of various mediations – technical, political, cultural – on the recycling of these materials, perceived as migratory flows crossing national, media, and institutional borders. An empirical study has been performed to determine the most reused stock shots and analyse the evolution of their meaning over time, space, and contexts. The goal is to decipher the mechanisms of circulation of these images and achieve objectives such as identifying frequently used visual forms, mapping their use, and examining their semantic transformation.
In the realm of journalism, news coverage does not systematically rely on fresh materials. Indeed, with the advent of digital technology and considering cost, efficiency, and various production challenges, journalists are increasingly favoring the reuse of stock shots as visual support for their reporting[3]. This practice can apply to the evocation of historical events (like the fall of the Berlin Wall), the illustration of present situations (such as the production of Euros to symbolize the state of the economy), or references to future events (like a European summit scheduled for the following week). As a matter of fact, regarding these stock shots produced in the past, ‘the object is less an echo of the past than the material for future exploration’[4]. From this perspective, stock shots are an essential tool for analysing representations of European integration, playing a crucial role in how Europe’s past, present, and future, in relation to its identity[5], are perceived and constructed.
In this data paper my focus will be on French television data, for which I was responsible for carrying out the collection process, alongside interns Aloïs Déras and Nikolina Golubović. The study of stock shots has significantly increased over the last two decades, driven by the archival turn,[6] which underscores the relevance of this research in contemporary academic discourse.
Data collection
In the context of French television, all data is archived within the legal television deposit maintained by INA. To delve into the stock footage, we examined the collection of these audiovisual materials along with their accompanying metadata. This original metadata, including documentary records crafted by INA archivists, provides crucial context for the videos we gathered. At INA, every news report is meticulously documented, typically involving indexing fields such as record identifier, news report title, program title, broadcasting channel, date and time of broadcast, duration, thematic focus, show genre, descriptors, and a lead-in or summary provided by the archivists, among others. However, in television the world of images is governed by words,[7] at least serving essentially as an expansion of verbal narration principles.[8]This phenomenon is mirrored in the metadata world of images. Contrary to visual elements being the documentary record’s core, it is primarily the words of journalists that INA archivists have indexed. This approach means that report indexing does not map directly onto each shot but rather abstracts the journalist’s discourse throughout the report. As a result, during traditional indexing, the visual aspects of audiovisual content often succumb to verbal elements, underscoring the dominance of language in shaping the archival narrative.
This motivated our decision to implement detailed annotations for the stock shots we collected. This newly-introduced metadata enriches the existing institutional metadata. By integrating additional descriptors to each stock shot, our project team identified four key aspects for characterising each shot: personality, event, location, and illustration. All annotations are conducted in English to maintain consistency and broad accessibility.
We specifically targeted news bulletins aired between 2001 and 2021 by prominent national channels: TF1,[9] France 2,[10] and ARTE.[11] Our search within the INA database hinged on the keywords ‘archiv*’ and ‘europ*’, enabling us to amass a comprehensive collection of stock footage pertinent to the EU. This approach was thorough, aimed at maximising data collection, albeit at the expense of encountering numerous ‘false positives’, which refer to instances where the search results include items that match the search criteria but do not meet the specific intent or relevance of our research. Initially, some collected reports pertained to European countries without directly relating to Europe as a political entity. Consequently, following this broad initial collection, we applied a more focused filtering process. This step was dedicated to identifying television news reports specifically concerning the EU, including its initiatives, institutions, or interactions with member states. Ultimately, our collection encompasses 2,212 news reports, comprising a total of 10,140 stock shots that have been annotated in detail.
We chose the period from 2001 to 2021 for our data collection to focus on a notably dynamic era of European integration. This span could be divided into two distinct phases. The first phase, extending from 2001 to 2010, was marked by significant institutional and diplomatic developments, from the Maastricht to the Lisbon Treaty. This era witnessed key milestones such as the establishment of the Euro and the European Central Bank, the initial rejection and subsequent quiet ratification of the Lisbon Treaty, and the EU’s involvement in the Western Balkans. The latter phase, from 2010 to 2021, encountered unparalleled challenges, including the European debt crisis, the so-called migration dilemma, the resurgence of Euroscepticism fueled by rising illiberalism, the complex dynamics of Brexit, and the global upheaval caused by the COVID-19 pandemic.[12]
In a closer analysis of our dataset, we noted two distinct spikes in 2005 and 2011, coinciding with the rejection of the European Constitution in France and the crisis in the financial market linked to the Greek government debt crisis, respectively. Moreover, the data reveals a general uptrend in stock footage from 2014 to 2019, peaking in 2015 – a year marked by a significant increase in the movement of refugees and migrants into Europe, according to data published by the UNHCR.[13], [14] Concurrently, we hypothesise that the increased utilisation of stock shots in news reporting may be attributed to the advent of digital technologies. This development has significantly enhanced journalists’ access to reusable materials, surpassing the limitations of the analog era.
Challenges and choices in manual versus automated annotation
Given the popularity of image annotation in both computational domains and audiovisual studies, one might wonder why there is a preference for intensive manual annotation over utilising algorithm-generated annotations. Automatic tagging has indeed become a cornerstone of visual analysis, with technologies such as the Distant Viewing Toolkit capable of autonomously generating metadata that captures some important aspects of the content (such as people, dialogues, scenes, and objects) as well as of the style (including camera angles, lighting, framing, and sound) of images. This approach benefits from the extraction and amalgamation of semantic elements from visual content, paving the way for subsequent exploratory data analysis.[15] The conversion of a visual medium’s pixel matrix into a structured semantic coding system exemplifies the advancements in metadata extraction. For instance, research involving the German historical broadcast archives utilised a content-based video retrieval strategy,[16] with the goal of automatically applying semantic tags to video sequences.
Present models exhibit sufficient versatility to facilitate research in diverse fields by leveraging algorithms for generating meaningful annotations. Nonetheless, this frequently demands additional processing or fine tuning.[17] Deep learning, for instance, heavily depends on large volumes of data, and vision models are typically trained on manually-annotated datasets. These datasets are expensive to create and cover only a limited array of predefined visual concepts. The ImageNet dataset is a prime example; it is one of the largest in this field, with over 25,000 people involved in labeling 14 million images across 22,000 different object categories. Moreover, applying an initial model to other tasks necessitates that a machine learning professional creates a new dataset, adds an additional output layer, and fine-tunes the model accordingly.[18] For instance, constructing a deep Convolutional Neural Network (CNN) requires millions of images, which can exceed our budget and project timeline, highlighting the limitations of existing tools to meet specific needs.
Multimodal Large Language Models (MLLMs) have been advancing rapidly in recent times. These models integrate images into traditional large language models (LLMs) and leverage their powerful capabilities, showcasing remarkable proficiency in tasks such as understanding images, answering visual questions, and following instructions. Notably, the recently released GPT-4V(ision) has elevated performance to unprecedented levels. In early 2021, Microsoft-supported OpenAI unveiled a multimodal machine learning model that can process visual concepts from imagery without the need for prior labeling. This model, known as Contrastive Language Image Pre-training (CLIP), transcends traditional computer vision models, which are generally limited to recognising a narrow spectrum of objects and people. CLIP innovatively pairs images with texts, utilising a vast training set of 400 million image-text pairs.[19] Such multimodal models revolutionise the way we access and analyse extensive visual collections, introducing a bottom-up approach that eliminates the necessity for manual metadata annotation.[20] However, recent studies indicate that there are still unresolved limitations, such as the potential for vision models to become bottlenecks in multimodal systems. For instance, MLLMs often struggle with simple questions because their pre-trained CLIP vision encoders miss critical visual details in images and consistently fail to recognise key visual patterns.[21]
Based on the computer vision techniques available at the time of our project, in the absence of tailored training and a lexicon or algorithms designed for a European context, automatic annotation tools revert to general annotations that lack the specificity needed for detailed, contextual annotations (such as Konrad Adenauer, Treaty of Rome, Capitoline Hill). Panofsky[22] referred to this phenomenon as ‘pre-iconographic description’. Dahlgren and Hansson[23] have observed that, although machine learning is adept at identifying basic elements like birds or cars, it can still be inadequate for the nuanced demands of academic research. Therefore, the application of detailed, context-aware annotation methods is critical to meet the unique needs of our study.
Another significant challenge we face is our interest not in the entirety of video content but rather in the often quite short stock shots concealed within each video. This necessitates a manual, qualitative approach, requiring researchers to meticulously view each report to discover and exclusively collect these hidden stock shots. Despite the extensive development of keyframe selection techniques in the computer science domain[24] – which involve detecting sequence boundaries, selecting sequences, and extracting keyframes – we opted for manual collection and annotation of audiovisual data.
However, it is important to recognise that manual annotation is subject to inherent variability due to individual interpretation. To guarantee data consistency and reliability, standardisation of the annotation process is imperative. This involves the creation of a comprehensive thesaurus and standardised annotation guidelines to ensure metadata and descriptions are uniformly assigned.
Thesaurus: Standardisation of data
Manual annotation is not a practice of describing images in an unrestricted fashion. Rather, it involves a disciplined approach to ensure the annotations are as controlled and standardised as possible. To achieve this level of harmonisation, several steps have been implemented. Our process began with an in-depth examination of stock shots from the national broadcaster TF1, setting a foundational understanding of our corpus and establishing the basic criteria for descriptors of each visual element: determining which elements are essential for annotation, which can be ignored, and how each should be characterised. This preliminary phase also informed our decision to systematically categorise each stock shot according to four key dimensions: personality, event, location, and illustration.
Building on our initial vocabulary, we expanded to include content from additional television channels. For each dimension of annotation, we have developed a controlled vocabulary consisting of three key components: the encoding name (which should have been used in the metadata), lexicon (synonyms of the encoding name), and description (a brief overview of the annotated entity). This controlled vocabulary was made available to all team members, ensuring the consistent use of terms across our project. In the event new terms needed to be introduced, researchers were encouraged to directly add the term, its synonyms, and definitions to the thesaurus, provided that all team members were notified of these updates. This dynamic process included a comprehensive review at the end of the annotation stage. The process occurred daily, ensuring prompt updates, maintaining consistency, quickly correcting errors, and adapting to new insights. This step was critical for refining the vocabulary, removing any redundancies, and upholding a unified lexicon among team members.
A noteworthy aspect of our annotation system is the presence of hierarchical relationships among different terms, particularly within the ‘event’ dimension. We have adopted a comprehensive approach for this dimension, incorporating encoding names, synonyms, user terms (associated but not necessarily synonymous terms), series, themes, and descriptions into its controlled vocabulary list. This approach acknowledges the variable scale of events, recognising that an event is not an ontologically constant entity.[25] Instead, an event’s definition relies on a framing process, with parameters pragmatically selected to enhance analysis. For instance, Brexit could be viewed as a singular overarching event or as a collection of smaller events (e.g. the referendum, negotiations, and official departure). This hierarchical structuring is similarly applied in the controlled vocabulary for locations, where Paris is categorised within France. However, distinguishing between the relationships among locations tends to be more straightforward. This structured approach could facilitate nuanced and layered analysis for later examination of the data.
Within the illustration dimension, team members were given the latitude to annotate a broad array of visual elements found in stock shots, which sometimes lacked rigid categorisation. Nevertheless, we employed statistical methods to discern potential themes among these descriptors. For instance, a dendrogram – a tool commonly used in hierarchical clustering – provides a visual interpretation of how clusters generated by the algorithm are organised. Each branch denotes a cluster, populated by descriptors that define its unique characteristics. For example, in the diagram (Fig. 2), Cluster 8, highlighted in pink, appears to center on political environments or parliamentary jargon, featuring terms like ‘hemicycle’, ‘MEP’ (Members of the European Parliament), ‘parliament’, and references to European nations and policy debates. Cluster 4, in green, encapsulates terms linked to political campaigns or civic demonstrations, including ‘militant’, ‘campaign’, ‘crowd’, and ‘demonstration’. Apart from detecting principal themes in our metadata, such a visualisation enables us to detect patterns in the visual representation of stock shots, which often closely align with prevalent media stereotypes of governmental entities.
Throughout the annotation process, we have adhered to a structured methodology that provides a uniform framework for annotation. This consistency is intended to pave the way for a robust examination of the visual content in subsequent stages of data analysis.
Initial insights from CROBORA metadata
Following the establishment of the CROBORA annotation framework, we delved into an exploratory analysis to gain a preliminary understanding of the metadata. This section offers a snapshot of the insights gleaned from our annotations, focusing on four descriptive dimensions as they pertain to French television. The findings presented here aim to lay the groundwork for more detailed future use of the data.
In examining the personality dimension of our metadata, we have the opportunity to explore the co-occurrence of political personalities within the same stock shot. Utilising network graph analysis, we can visually map the interconnections between these individuals; it is a type of sociogram which is used in network analysis to represent relationships. The nodes (labels with names) represent people, and the lines (edges) indicate some form of association or connection between them. The graph (Fig. 3) shows a dense web of connections with some nodes more centrally located and with more connections, suggesting these individuals may be key figures within the network. The size of the node labels such as ‘Angela Merkel’, ‘Viktor Orbán’, and ‘Jean-Claude Juncker’ indicates their prominence or a higher degree of connectivity within the network, implying they may be central or influential figures in the context being analysed. The different colors of the lines could represent various types of relationships or interactions, such as political affiliations, collaborative work, or other forms of connections.
The horizontal bar chart (Fig. 4) represents the frequency with which various geographical locations are mentioned or appear within our dataset. The length of each bar is directly proportional to how often a location is cited. From the visualisation, we observe that Brussels tops the list, indicative of its central role as the de facto capital of the EU and its prevalence in news reports concerning EU affairs. It is followed by broad entities such as ‘Europe’ and then by specific locales like France and Paris; London, Strasbourg, Athens, and the United Kingdom also emerge as significant locations, reflecting their importance in the European context.
While the Mediterranean Sea appears with notable frequency, which could be linked to specific events like the 2015 European migrant crisis, the chart provides insight into the level of detail captured in our metadata regarding location. A considerable portion of our stock shots initially lacked clear geographical indicators, either visually or within the provided institutional metadata. Consequently, we strived for precision, aiming to identify the city level as the smallest unit of location. However, as the persistence of annotations like ‘Europe’ demonstrates, achieving such specificity remains a challenge, highlighting the ongoing need to enhance the granularity of our location metadata. Meanwhile, this challenge also indicates the fact that in news reports, most stock shots serve an illustrative function. ‘The more a sequence can be decontextualized, the more it will be favored by archivists for preservation.’[26] Indeed, stock footage is not necessarily limited to capturing specific, irreplaceable moments, but can instead be generic, denoting general classes or types of people, places, and things, rather than specific individuals, locations, or objects.[27] In our case, journalists often aim to represent the concept of a ‘European city’ by images without specifying a particular city.
The bar charts (Fig. 5) illustrate the frequency of annotated elements in object dimension within the stock shots broadcast across three channels. Common terms like ‘European flag’, ‘crowd’, ‘meeting’, and ‘population’ prominently feature on TF1, France 2, and ARTE. This recurrence indicates a shared editorial interest in representation characterised by public assemblies, policy discussions, and population-related themes within a European context. Additionally, annotations such as ‘cityscape’, ‘building’, ‘arrival’, ‘factory’, and ‘migrant’ suggest comprehensive coverage of urban development, economic activities, travel, industrial sectors, and migration topics. Despite small variations between different channels, the data reveals a highly homogeneous visual representation across different broadcasters, as evidenced by the consistency in standardised annotations. This homogeneity underscores the overarching visual trends in European news reportage and the efficacy of our annotation process in capturing them.
The metadata about events on different channels also unveils a notable uniformity, underscoring the influence of the agenda-setting phenomenon within the media landscape. By constructing bar charts to visualise the frequency of event-related terms mentioned in the stock footage of three French television channels, ‘Debt’ is the most frequently occurring term. Other terms like ‘Brexit’, ‘External immigration’, ‘TCE’ (Treaty establishing a Constitution for Europe), and ‘EP’ (European Parliament) also appear prominently across the charts, suggesting these are also key topics of interest. TF1 displays a strong focus on the ‘Debt’ issue, with ‘TCE’ and ‘External immigration’ also being notable. The presence of ‘Brexit’ is somewhat less, but still significant. France 2 shows a very similar pattern for the top issues but also includes additional terms such as ‘SGP’ (Stability and Growth Pact), ‘Internal immigration’, a term referring to the mobility of residents within the EU, and ‘EC’ (European Commission), which do not appear in the TF1 chart. ARTE has distinctive terms like ‘France Germany relations’ and ‘Russia EU relations’, along with ‘Presidency of the Council of the EU’, which are not highlighted in the other two charts. This may suggest a more focused interest in bilateral and specific regional relationships within Europe and point to coverage of more nuanced EU governance topics.
For subsequent investigations, it is conceivable to undertake a more nuanced analysis that intersects various factors. For instance, scholars could cross-examine the thematic aspect of events with their visual depiction, identifying recurrent imagery linked to particular EU-related occurrences in news reports. Additionally, they might employ the geographical annotations present in the dataset to create a detailed cartographic representation, offering spatial insights into how EU events are related to various regions. The scope of analysis can also extend beyond mere textual analysis. Indeed, with the metadata for French television reports published in this paper, the CROBORA project anticipates the imminent rollout of a visual platform, targeted for release this coming autumn. This platform is designed to empower researchers with an interest in this area to delve into both the metadata and the audiovisual content with greater freedom and analytical flexibility.
CROBORA Visual Platform[28]
The culmination of the CROBORA project will be its integration into a sophisticated visual platform, currently being developed by the project’s team members, including Matteo Treleani as the principal investigator, Jean-Marie Dormoy, Aline Menin, and Marco Winckler. This platform will serve as a centralised resource, offering seamless access to a wealth of data, encompassing metadata and audiovisual content from French and Italian television and online. Designed with the end user in mind, the platform is equipped with advanced functionalities that not only elevate the analytical process but also enhance the overall user engagement.
One of the principal features of the platform is its dynamic search capability, which allows users to navigate the corpus with ease by combining various keywords. This feature is complemented by two analytical functions: ‘Distributions’ and ‘ARViz’[29].Within ‘Distributions’ we have integrated ‘Muvin’[30], a visualization software that marries the fluidity of streamgraphs with the clarity of node-link diagrams, enabling users to investigate multidimensional collaboration networks effortlessly. Muvin shines by enabling an interactive exploration of keyword co-occurrence and their temporal dynamics within the CROBORA dataset.
The ‘ARViz’ function is engineered to refine the process of association rule mining within the CROBORA corpus. It provides a user-friendly interface that lays out a broad spectrum of association rules in a scatter plot for an overarching view, delineates subsets of rules in a chord diagram for focused analysis, and presents itemsets in an association graph for detailed scrutiny. Simultaneously, users have the capability to import and annotate new audiovisual materials, as well as explore their annotations via tools provided on the platform within their personalised user space.
Conclusion
This data paper seeks to enhance media and visual studies in journalism, exploring the construction of collective imaginary through the media’s repetitive use of images. During our research project, we seek to find redundant phenomena in audiovisual production, which reinforces types or categories that can include diverse groups of people, locations, and lifestyles[31]. Contemporary trends in audiovisual production are increasingly moving away from an analogical representation of images towards one that values their conventional and symbolic semiotic significance, where the images connote more than they denote. Therefore, this shift paradoxically often leads to stereotypical portrayals of social diversity[32].
This shift places greater emphasis on the symbolic meanings of images rather than their realistic depictions. In the fields of media processing and media effects, journalists tend to select images that resonate with viewers’ mental models. These models are dynamic mental representations of situations, events, or objects[33]. Consequently, the most frequently circulated images are likely those that are most emblematic of cultural stereotypes. Cultural dynamics rely heavily on the effects that result from the repetitive and cumulative nature of images and media formats,[34] which in turn contributes to the evolution of visual culture. This repetition leads to a proliferation of images that can influence and even saturate visual culture[35].
In this study, we focused on three nationwide media outlets in France, hypothesising that their visual representations might be quite similar due to their comparable scopes. To further investigate this issue, it would be beneficial to compare our data with that from alternative sources, such as local or online media, to gain a deeper understanding. Through this approach, we can allow the data to reveal potential insights. In the end, we must ask ourselves: are we moving toward a visual world that is becoming increasingly homogeneous due to repetitive media practices, or does a diversity still exist that ensures a heterogeneous visual landscape?
Author
Shiming Shen is a PhD student in Communication Studies under the supervision of Nicolas Pélissier and Matteo Treleani, currently serving as an ATER (Attaché Temporaire d’Enseignement et de Recherche) at Côte d’Azur University. Her research, supported by the ANR CROBORA Project, explores the intersection of media semiotics and digital humanities, focusing on conventional representations of the European Union.
References
Aiello, G., Severo, M., and Dondero, M. Communication, espace, image. Les presses du réel, 2022.
Arnold, T. and Tilton, L. ‘Distant Viewing: Analyzing Large Visual Corpora’, Digital Scholarship in the Humanities, 34, Supplement_1, 2019: i3-16.
_____. Distant viewing: Computational exploration of digital images. Cambridge: The MIT Press, 2023.
Bourdieu, P. Sur la télévision. Paris: RAISONS D’AGIR, 1996.
Carnel, J. Utilisation Des Images d’archives Dans l’Audiovisuel, edited by Hermes Science. Systèmes d’information et Organisations Documentaires. Lavoisier, 2012.
_____. ‘Ces Images d’archives Qui Font l’actualité Dans Les Journaux Télévisés’ in L’image d’archives: Une Image En Devenir, edited by J. Maeck and M. Steinle. Rennes: Presses universitaires de Rennes, 2016: 167-181.
Casati, R. and Varzi, A. ‘Events’ in The Stanford encyclopedia of philosophy, edited by E. Zalta. Metaphysics Research Lab, Stanford University, 2020.
De Oliveira, J. ‘Yves CITTON (2017), Médiarchie’, Communication. Information médias théories pratiques, no. vol. 35/2, November 2018.
Desgoutte, J. ‘Jean-Paul Desgoutte’, L’énonciation Audiovisuelle, 2002.
Dijk, T. and Kintsch, W. Strategies of discourse comprehension. Academic Press, 1983.
Dirfaux, F. ‘Key Frame Selection to Represent a Video’ in Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101), 2:275-78, vol. 2, 2000.
Frosh, P. ‘Inside the Image Factory: Stock Photography and Cultural Production’, Media, Culture & Society, 23, 5, 2001: 625-646.
Hansson, K. and Dahlgren, A. ‘Open Research Data Repositories: Practices, Norms, and Metadata for Sharing Images’, Journal of the Association for Information Science and Technology, 73, 2, 2022: 303-316.
Kahn, S. Histoire de la construction de l’Europe depuis 1945. Manuels hors collection. Paris cedex 14: Presses Universitaires de France, 2021.
Krcmar, M. and Haberkorn, K. ‘Mental Representations’ in The international encyclopedia of media psychology, 1-17. John Wiley & Sons, Ltd, 2020.
Leeuw, S. ‘European Television History Online: History and Challenges’, VIEW Journal of European Television History and Culture, 1, 1, 2012: 3-11.
Lits, M. ‘Les Télévisions Belges Au Carrefour Européen’ in Les Lucarnes de l’Europe: Télévisions, Cultures, Identités, 1945-2005, edited by M. Lévy and M. Sicard. Internationale. Paris: Éditions de la Sorbonne, 2009: 139-149.
Machin, D. ‘Building the World’s Visual Language: The Increasing Global Importance of Image Banks in Corporate Media’, Visual Communication, 3, 3, 2004: 316-336.
Machin, D. and Jaworski, A. ‘Archive Video Footage in News: Creating a Likeness and Index of the Phenomenal World’, Visual Communication, 5, 3, 2006: 345-366.
Messaris, P. Visual ‘literacy’: Image, mind, and reality. Boulder: Westview Press, 1994.
Mühling, M., Meister, M., Korfhage, N., Wehling, J., Hörth, A., Ewerth, R., and Freisleben, B. ‘Content-Based Video Retrieval in Historical Collections of the German Broadcasting Archive’ in Lecture Notes in Computer Science, 9819, 2016: 67-78.
Panofsky, E. Studies in iconology: Humanistic themes in the art of the Renaissance. Westview Press, 1972.
Radford, A., Kim, J., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I. ‘Learning transferable visual models from natural language supervision’ in International Conference on Machine Learning, PMLR, virtual, 2021: 8748-8763.
Rebillard, F., Fackler, D., Marty, E., Lanctot, J., Loicq, M., and Libbrecht, L. ‘Is News More Diversified on the Web than on Television?’, Reseaux, 176, 6, 2012: 141-172.
Saracco, C. ‘Politique Des Archives Audiovisuelles’, Thèse de doctorat, Université Stendhal, Grenoble, 2002.
Seurrat, A. and Bruneel, E. ‘Figurer « la diversité » ?’, Semen. Revue de sémio-linguistique des textes et discours, no. 45, October 2018.
Shen, S., Treleani, M., Compagno, D., and Winckler, M. ‘From Stock Shots to Ghost Data: Tracking Audiovisual Archives about the European Union’, VIEW Journal of European Television History and Culture, 12, 23, 2023: 4.
Smits, T. and Wevers, M. ‘A Multimodal Turn in Digital Humanities. Using Contrastive Machine Learning Models to Explore, Enrich, and Analyze Digital Visual Historical Collections’, Digital Scholarship in the Humanities, 38, 3, 2023: 1267-1280.
Tong, S., Liu, Z., Zhai, Y., Ma, Y., LeCun, Y., and Xie, S. ‘Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs’, arXiv, 2024.
Treleani, M., Shen, S., Menin, A., and Winckler, M. ‘Circulation et Répétition Des Images de Stock Dans Les Médias Audiovisuels. La Représentation Visuelle Des Migrants Dans Les JT Européens’, Les Cahiers Du Numérique, 19, 1, 2023: 13-34.
Zhao, G. ‘A Novel Approach for Shot Boundary Detection and Key Frames Extraction’ in 2008 International Conference on MultiMedia and Information Technology, 2008: 221-224.
Zheng, F., Yang, C., Chong, P., Wang, G., G.G.Md., Ali, N., and Lam, P. ‘Deep Learning Algorithm for Picture Frame Detection on Social Media Videos’ in 2021 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS), 2021: 149-155.
[1] https://crobora.huma-num.fr/.
[2] Treleani & Shen & Menin & Winckler 2023; Shen & Treleani & Compagno & Winckler 2023.
[3] Carnel 2012.
[4] Saracco 2002.
[5] Lits 2009.
[6] Leeuw 2012.
[7] Bourdieu 1996.
[8] Messaris 1994.
[9] French commercial television network owned by TF1 Group.
[10] French public national television channel.
[11] European public service channel dedicated to culture.
[12] Kahn 2021.
[13] https://data.unhcr.org/en/situations/mediterranean.
[14] Treleani & Shen & Menin & Winckler 2023.
[15] Arnold & Tilton 2019.
[16] Mühling & Meister & Korfhage & Wehling & Hörth & Ewerth & Freisleben 2016.
[17] Arnold & Tilton 2023.
[18] Radford & Kim & Hallacy & Ramesh & Goh & Agarwal & Sastry & Askell & Mishkin & Clark & Krueger & Sutskever 2021.
[19] Idem.
[20] Smits & Wevers 2023.
[21] Tong & Liu & Zhai & Ma & LeCun & Xie 2024.
[22] Panofsky 1972.
[23] Hansson & Dahlgren 2022.
[24] Dirfaux 2000; Zhao 2008; Zheng & Yang & Chong & Wang & G.G.Md & Ali & Lam 2021.
[25] Casati & Varzi 2020; Rebillard & Fackler & Marty & Lanctot & Loicq & Libbrecht 2012.
[26] Carnel 2016.
[27] Machin 2004.
[28] http://dataviz.i3s.unice.fr/crobora.
[29] Menin & Cadorel & Tettamanzi & Giboin & Gandon & Winckler 2021.
[30] Menin & Buffa & Tikat & Molinet & Pelerin & Pottier & Michel & Winckler 2022.
[31] Frosh 2001; Aiello & Severo & Dondero 2022.
[32] Desgoutte 2002; Machin & Jaworski 2006; Seurrat & Bruneel 2018.
[33] Dijk & Kintsch 1983; Krcmar & Haberkorn 2020.
[34] Treleani & Shen & Menin & Winckler 2023.
[35] Citton 2017.