AI and the challenge to journalism
Charlie Beckett (Director of Polis and the Polis/LSE JournalismAI project)
A systematic review of automated journalism scholarship: guidelines and suggestions for future research
Samuel Danzon-Chambaud, Alessio Cornia (School of Communications, Dublin City University)
The use of advanced algorithmic techniques is increasingly changing the nature of work for highly trained professionals. In the media industry, one of the technical advancements that often comes under the spotlight is automated journalism, a solution generally understood as the auto generation of journalistic stories through software and algorithms, with no human intervention except for the initial programming (Graefe, 2016). Also known as “robot journalism” and “algorithmic journalism,” the technology has been employed for several decades in domains such as sports, finances and weather forecasting (Dörr, 2016), and builds on Natural Language Generation (NLG), a computer process that generally involves the use of pre-written templates, but also more advanced machine learning techniques (Diakopoulos, 2019). In this paper, we systematically review the field of automated journalism and thus provide guidelines and suggestions that can be used for future research in this area.
In order to conduct a systematic review of existing empirical research on automated journalism, we analysed a range of variables that can account for the semantical, chronological and geographical features of a selection of academic articles as well as their research methods, theoretical backgrounds and fields of inquiry. We then engaged with and critically assessed the meta-data obtained in order to provide researchers with a good understanding of the main debates dominating the field.
Our findings suggest, among others, that a deeper investigation into media practitioners’ reactions to the introduction of the technology is needed, and that well-established sociological theories such as Institutionalism and Bourdieu’s Field theory may constitute two adequate frameworks to study its practice. The lenses provided by Field theory could especially be worth exploring, since they are likely to add “a vector of power dynamics” to a field of technological innovation “too often understood from within an “all boats will rise” mentality”’ (Anderson, 2012, p. 1013).
Computational Propaganda, the mother of modern misinformation. An exploration on understanding and detecting it.
Sebastian Garcia (Czech Technical University, Prague; AVAST)
There will be no war anymore without misinformation, propaganda and cyberattacks. Fake news is only a tool, the tip of the iceberg, and that iceberg is propaganda. The Internet and our social networks, our own connectedness helped evolve propaganda with automation, armies of people, and a flood of misinformation. We are in an era of computational propaganda. Computational propaganda allowed, for the first time in history, to do automated targeted propaganda, to know each of us so well, that they can deliver personalized automated messages to move our decisions. Computational propaganda is a powerful novel technique that fosters dissent, manipulation, distrust, division and confusion like never before. This keynote will explore with the audience the new era of computational propaganda, its impact in our perceptions of news, its mechanisms, its failures, and its history. It highlights that we need to understand and address the problem of propaganda if we want to be able to have an informed decision. It will present some new ideas in the area of AI for detecting computational propaganda and how they can affect our consumption of news.
AI for detecting Computational Propaganda Campaigns on the Internet
Sebastian Garcia (Czech Technical University, Prague; AVAST)
Elnaz Babayeva (AVAST)
November of 2016 was the beginning of an important event. Not the new president of the United States, but related to it: the birth of the world’s interest in fake news. Since then thousands of researchers have devoted to finding and detecting fake news. But fake news is the tip of the iceberg, and that iceberg is propaganda. Propaganda is the factory where misinformation is designed, and fake news is only one of its products. As good as detecting fake news can be, a larger impact would be in detecting when the propaganda machine was the origin of some news. To know when we are being deliberately manipulated and not just looking at false news. But detecting propaganda is a very hard problem since countries spend incredible resources to create and support domestic and foreign propaganda. More importantly, propaganda has become targeted, being able to powerfully focus on, and manipulate, individuals. Targeted propaganda is automatically created, based on online social networks, and able to profile users and predict their behavior with high precision. It is computational propaganda. This research first presents a tentative definition of computational propaganda that is operational in nature. A definition that may be used to better identify those technical aspects that can be used for a better detection. This definition was part of a social exploration with journalists and a technical exploration of real cases of computational propaganda. With this definition in mind, our research focuses on applying AI to better detect computational propaganda. Our research builds an online social distribution graph of how propaganda was spread on the Internet. It is already a known fact that false news spread faster and deeper than true news. So, because computational propaganda is an organized and coordinated force that distributes ideas, we hypothesize that it is possible to detect distinguishable patterns of how the news were spread compared to non-propaganda news. Our research analyses the distribution patterns in time for different news, together with text analysis and profiling of social accounts to find when a certain topic was artificially pushed in our society as propaganda. We will show, in our ongoing project, how this distribution graph can help better identify computation propaganda.
Automation in NLP to optimize editorial workflows for international broadcasters
Peggy van der Kreeft
Deutsche Welle, the German world broadcaster producing in 30 languages, has been exploring language technologies for use in the media production workflow for over 10 years. The technology is now advanced enough to embrace it, to implement and integrate it in the everyday editorial work. Close collaboration between technology and industry partners is key to a successful and sustainable outcome. We are briefly presenting a few cases of research project concepts becoming efficient tools adapted to an editorial environment.
What the metrics say. Online news popularity on the web and social media pages of mainstream media outlets
Kenza Lamot, University of Antwerp, Communication Studies
Traditional practices of gatekeeping were constructed in an age where news reporters and editors had little to no direct contact with their audiences, and editorial decision-making was based on normative assumptions about what is relevant for society (Tandoc & Vos, 2016). Nowadays, audience analytics allow news consumption to be monitored and fed back into professional gatekeeping decisions and assessments of newsworthiness (Anderson, 2011; Tandoc, 2014; Vu, 2013). As the algorithms that enable those systems define “newsworthiness” through the narrow lens of what can be quantified (Diakopoulos, 2019), journalists are likely to focus on content that is potentially “engaging” and “clickable”. Several scholars have found that algorithmically derived audience metrics like “popularity” are impacting editorial decisions from story placement, story selection and deselection to imitation (Lamot & Paulussen, 2019; Zamith, 2018). However, there has been hardly any content-analytical work that investigatesthe degree of algorithmic influence on journalistic news content itself. One can expect that certain content characteristics are considered to be of more value for the algorithm and, thus, to contribute more strongly to news popularity or high audience metrics and prominence on the news outlets’ web and social media channels. The present study seeks to explore the factors that explain the popularity of (or: audience engagement with) online news, by drawing the comparison between popularity on websites (clicks and attention time) and popularity on Facebook (likes, shares, comments). Concretely, we aim to shed light on the following research questions:
RQ1: To what extent can news popularity on websites and social media be explained by news values, news topics, and article types?
RQ2: To what extent does news popularity on the web and social media pages of ‘quality’ media outlets differ from those of ‘popular’ media outlets?
A computational and manual content analysis was conducted of all news stories published online in four consecutive weeks in January 2020 on five major Belgian Dutch-speaking news media outlets, including 2 quality newspapers, 2 popular newspapers and the Flemish public broadcaster. (N=10,400). We computationally collected all news website articles, their publication date, and length. Subsequently, these articles were scrutinized for their presence on social media and manually coded on a range of other factors such as topic (e.g. politics, sports, lifestyle, …), article type (e.g. opinion piece, news video, feature story, …) and news values (e.g. conflict, celebrity,…). To measure “online news popularity”, we used real-time data analytics. From each of these five media outlets, we were granted access to their Google Analytics platform and in-house dashboard. Hence, we have a wide range of metrics at our disposal like pageviews, reach and attention time. We supplemented the onsite user data with Facebook (using CrowdTangle) and Twitter engagement data (number of likes, comments, retweets) for each news story posted on the Facebook and Twitter pages of the five news outlets. At the time of writing (July 2020), the automated and manual coding of the entire dataset has been completed, so we will be able to present the first results of our analyses during the konference and discuss how gatekeeping decisions in newsrooms are increasingly being challenged by the proliferation of algorithms.
Building Czech AI Journalist – Prerequisites
Radek Marik (Czech Technical University, Prague)
Vaclav Moravec (Charles University, Prague)
As in historically many other fields, the complex rapid dynamics of the current news production environment leads to increased demand for automation of simple tasks. On the one hand, the automation increases news accuracy, under certain conditions their objectivity, it allows journalists to focus on topics requiring human intellect, on the other side produces relatively cheaply an amount of news in very limited time horizons. Automation thus enables the production of messages even at a very regional level, messages adjusted to clients’ needs, which often has a beneficial effect on increasing readability. Such news production’s efficiency depends on appropriately selected domains, which can be reported even with a limited level of knowledge available in current state-of-art artificial intelligence methods. The vast majority of implemented systems rely on relatively simple fact gathering based on repeatable access and processing of online data produced in a standardized way in selected domains at regular intervals. Modern methods of data mining, combined with machine learning, make it possible to focus the attention of journalists on exciting events selected using a vast number of unreliable sources or to evaluate their credibility. However, advanced machine learning allows us to discover stories, detect and monitor events, make predictions, find the truth, and curate content, and also might help build and configure automatic report generation subsystems.
In our paper, we present the experience with generating reports in the Czech environment. We deal with the analysis of automated report generation conditions in such an inflectional language such as Czech. The analysis is performed on the corpus of Czech News Agency reports. We present the essential statistical characteristics of the corpus and the typical life cycle of the news series. The results point to one underlying problem of the possible use of such corpus data for machine learning purposes, which is the lack of data. On the other hand, we will show how using a simple tool based on the perplexity of domain messages can determine the suitability of a particular domain of data journalism to automate message generation and define the system’s complexity using templates. The same tool can be used to identify the language structures of inflection languages that need to be addressed to simplify the complexity of identifying the necessary phrases and generating them.
Rethinking the economics of local public interest news – Combining journalists, data and automation to make quality content affordable at a granular level
Alan Renwick
In a digital economy, the economic incentives to produce localised news have been severely reduced, and so supply has fallen. While most debate has been about the revenue side of this equation, RADAR (Reporters and Data and Robots) has set out to remove the marginal cost of granularity from the content side. A team of five reporters supply over 400 news titles with a daily feed of localised, high impact news articles.
Connect the dots: automation in investigative journalism workflow
Joana Rodrigues da Silva, Doctoral Program in Digital Media in Faculty of Engineering of University of Porto
As we strike the pinnacle of the digital era, we recognize that technological developments, central in newsrooms, are not only characterized by increasing automation but also by the changing paradigm of journalists working with and through artificial intelligence. This growing need for automated content production and the so-called omnipresent algorithms were the main motivation for the design of the investigation behind the non-functional prototype “connect the dots”. By understanding the standard workflow of investigative reporters in the main public radio and tv broadcast station in Portugal, we were able to identify patterns that could be replicable and reproducible. This same patterns will be integrated into a system that intents to help the investigative journalists in sorting and organizing data and to reduce the time spent in the most tedious tasks. In this investigation, our purpose is to incorporate AI in the standard workflow of investigative reporting and by doing this leverage the role of investigative journalism in the current media landscape. By focusing our investigation in the news production, we will be able to understand how far algorithms and automation of processes can influence and change newsroom organization and routines.
The Data Logic of Public Service Media Publishing - Concerns and Approaches when Implementing Recommender Systems for Public Service Media
Jannick Kirk Sørensen, Aalborg University, Denmark
When public service media adopt algorithmic recommender systems for personalized presentation of content, the conceptualization of public service media and its media obligations are challenged (Sørensen & Hutchinson, 2018). E.g. the exposure of diversity of viewpoints and worldviews which previously was curated by editors now must be expressed in the unambiguous meta-data language of recommender systems (Sørensen & Schmidt, 2016). Other aspects of human editorial curation, such as editorial prioritization must also be expressed in metadata numbers. On the other side, new opportunities in creating targeted exposure and creating audiences must be understood. With Danish Broadcasting Corporation (DR) as case – supported by an outlook to eight other cases of implementation of recommender systems (cf. Sørensen, 2019) – we analyze dilemmas and strategies to combine public service media obligations with recommender algorithms. Over a period of three years we followed, via 13 in-depth interviews with project staff, DR’s implementation of a recommender system for its Video on Demand service ‘DRTV’. Our findings indicate that traditional editorial control is still maintained and the automated exposure of content only plays a secondary role. The path to automation is not straight forward, every step is taken with care. We also notice organizational tensions between performance-oriented data specialists and traditional editorial staff. Generally, the process of interpreting and domesticating the recommender systems technology exposes structural problem in the public service media (PSM) value proposition: At same time as PSM should be driven by ideals of enlightenment, education and entertainment (cf. e.g. UNESCO, 2001), it must also respond to the materiality of algorithms and key performance indicators (Buchner, 2018). In this way, the process of implementing recommender systems clearly exposes public service organizations’ old dilemma of trying both have a good reach and having a specific and distinct purpose in the media landscape (Nissen, 2006). It also highlights the structural tensions between the quantitative and qualitative rationale for public service broadcasting / -media (McManus, 1994; Tracey, 1998).
From “Class Conflict” to “Class Identity”: A Study of Narrative Rhetoric of Fake News on Chinese New Media Based on Burke’s Identification Theory
Lei Shi (School of Journalism and Communication, Shanghai International Studies University)
The post-truth phenomenon represented by fake news spread instantly from western politics to the whole western society, which has become a common phenomenon during the process of international society’s development. Alterman (2005) stated that all the regimes were built upon fake news and the post-truth political environment. Post-truth has been a new political culture in the West, and post-truth politics was interpreted as a kind of populist movement against the elite (Speed & Mannion, 2016).
Although most of the previous researches focus on the post-truth phenomena within the western context, the concept of “post-truth” is not an exotic in China. Two thousand years earlier, Lao-Tzu had realized the falsity of the media environment. “The Tao, which can be told, is not the true Tao; The Name, which can be named, is not the true Name”. “The Tao, once communicated, has deviated from the original Tao, and so does the Name; It shows distinction between the original message and the message after several times of transmission” (Xie & Yu, 2018). Viewing the current state of communication in China, with the emergence of fake news represented by the Shenzhen Luoer fraudulent donation incident, the Shaanxi Yulin gravida’s falling incident, and the Heilongjiang Tanglanlan incident, the post-truth phenomenon characterized by emotions superior to facts has become very common in China recently. “The current public opinion field of Internet in China gradually takes on the characteristics of “post-truth”, which manifests the prejudice superior to the fact, the emotion superior to the objectivity, the discourse superior to the truth, and the attitude superior to the cognition” (Zhang, 2017). Considering that the post-truth phenomenon is considered as a populist movement in the West, this research will analyze the post-truth phenomenon featured with inundant fake news within Chinese context from the perspective of social class. The research on Chinese post-truth phenomenon can not only optimize ecology of public opinion and improve media environment of China, but also enhance understanding and recognition among classes and hence ease class conflicts. What’s more, analyzing the rhetoric of fake news can contribute to the development of AI journalism in terms of news production, supervision and fake news elimination.
This research proposed the hypothesis that the producers of fake news make use of the metaphor and rhetoric to fabricate the conflicts among different classes and identity within the same class, which intrigue the audience’s strong emotions to express themselves through “comment”, “like” and “repost”. This research employs the discourse analysis method based on Kenneth Burke’s identification theory to analyze the rhetoric of fake news on Chinese new media represented by Weibo and WeChat in nearly one year, from July 31, 2019 to July 31, 2020. According to Burke (1968), there are three paths to reach identification, namely “identification by sympathy”, “identification by antithesis” and “identification by inaccuracy”. “Identification by sympathy” is based on the common feelings shared by the same class; “identification by antithesis” targets on the “hostile” class to achieve congregation and the consistency of the attitudes and views; “identification by inaccuracy”, or “unconscious identification”, is the deepest level within the rhetorical environment, which reflects human’s unconscious behavior and produces hallucinatory, unconscious identification through rhetoric.
Czech Radio on the Road to the AI Future
Jiří Špaček (Czech Radio)
At Czech Radio we are developing a system for analytics of broadcasting that detects and identifies sounds as well as speech-to-text (STT) system which transcribes speech, recognizes, and identifies speakers. We are working on a full description of our radio broadcasting archive founded in 2003. In addition, we continuously work on outputs of STT of our live broadcasting to journalists and our audiences with real-time access. Internal employees, especially journalists, will get also access to on-demand STT or live transcription of the phone calls they did with respondents.
Looking at the very close future, we are going to offer to our online audience a new modern audio player on the mujRozhlas website with automated transcription, identified speakers, recommended content, contextual information, time-shift, etc. Next year, I would like to start work on some of the projects like Journalist AI Assistant which is helping to mine data, verify information ant its source, and search for similar text and context. Or an App for creating radio shows in which the users need a written dialogue, then choose the voices they need to produce their radio program. From this perspective, we are getting very close to autonomous personalized and customized radio stations in which there are content producers on one side, consumer devices on the other side, and autonomous AI systems between them.
Online News Video: What Viewers Demand and Automation Supplies
Neil Thurman (Ludwig-Maximilians-Universität, München)
Automation is playing a role in the creation of news texts, initially generating natural language—the written word—but now also producing short-form news videos. Professor Thurman’s talk will give an overview of this relatively new development; talk about what matters to consumers about online news videos, from balance and bias to the presence of music and moving images; and share some results from a survey experiment designed to find out how audiences rate automated news videos against human-made equivalents, and why.
Avast's approach to fake news
Tomáš Trnka, Dan Martinec, Jakub Sanojca (AVAST)
Applying algorithms to help us understand and mitigate the Fake News phenomenon is a challenging task. Not only because of the current technical limitations but mainly because of our internal human shortcomings. Although ever increasing personalisation and recommendation options can make us believe we are shaping our view ourselves, information conflicting with how we are seen by online providers is conveniently hidden, which only strengthens confirmation bias.
At Avast, we don’t think that technology is a silver bullet for everything, but we are convinced that it can simplify tedious information verification to make our lives easier and let us focus on more meaningful endeavours. Though it is a complex task that won’t be solved tomorrow, we believe in a step-by-step approach, where providing people more context around the information is the good starting point. We try to achieve that by automated metadata extraction that can allow making more informed decisions.
Avast News Companion is a browser extension that automatically and consistently evaluates bias for media pages. By relying on trustworthy and unbiased third parties, journalists included, we provide something that even a very motivated reader would need to spend a long time to achieve. To provide multiple viewpoints, existing sources were integrated together, yet considerable effort including algorithmic work was still required to achieve that goal.
The follow-up of the extension addresses the fact checking problem. Specifically, how to use Natural Language Processing algorithms and publicly available online sources such as Google Fact Check Tools API for providing the user additional information about the article that she is reading. In the talk, we will also show interestings statistics, for instance, what typically caused the false positives or the most prevalent ‘facts’ that we encountered for the past few months.
We will conclude the talk by sharing insights that we have gathered as well as things that we have learned to do and not to do.
Algorithmed public sphere’: Theoretical challenges and possibilities
Yanshuang Zhang (Guangxi Normal University, China)
Abstract Against the backdrop of the big data era underpinned by algorithms and artificial intelligence (AI), the modalities of media have undergone tremendous transformation. Nowadays many emerging news aggregation services have adopted collaborative filtering aided by big data algorithm to select like-minded audience with approximate experience, who consume information while unconsciously entering a collaboration mechanism by rating or marking the content so that other recipients are able to filtrate media messages more quickly. Driven by these technologies, the public as participants of the public spheres has gained paramount power in freely selecting and disseminating information. Those marginalised, ignored, depressed voices prone to sinking in the spiral of silence have seized opportunities to surface and get noticed. This free choice has set good foundation for the reconstruction of public sphere while posing great challenges to the task in the meantime. This research first discusses the primary theoretical challenges brought about by the reality of the increasing group polarisation in the public, the extreme consumer society, and the deprivation of informational diversity caused by the ‘filter bubbles’. Habermas’ public sphere theory, which originally conceptualises a separate social space between society and state where the emerging bourgeois can utilise media to discuss or debate freely over societal problems, and by doing this influence political agendas, is expected to be revisited and revitalised by new academic inquiries, and to make sense of the new circumstances. After discussing how AI technologies hinder the construction of tenable online publics, this research tentatively suggests the possibilities in reconstituting sound and functioning public spheres in the Age of the Algorithm, that is, on one hand, return to and rediscover human nature of complexity and richness, and attempt to regain dynamic balance between instrumental rationality and value rationality; on the other hand, take advantage of new technologies to promote the production of pure relationships and the deliberative and dialogic democracy, and cultivate new foundation for social trust and common identity in the increasingly disrupted publics.