Data Driven Journalism
Giulia Dezi, Giorgio Dimino, Maurizio Mazzoneschi,
Alberto Messina, Sabino Metta, Giuseppe Mondelli, Maurizio Montagnuolo
RAI – Radiotelevisione Italiana
Centre for Research and Technological Innovation
FIAT/IFTA World Conference 2016
“Rethink the future of AV”
October 12 to October 15, Warsaw
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Agenda
Definitions
Best practises
Our approach
The data team
Tools & architectures
Some preliminary results
Development ideas
Conclusions
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Role of Data in Journalism
 as origin of the news and of related content
(data driven journalism)
 as accurate and verifiable description of inherent
semantic aspects
(precision journalism)
 Basic building block for presentation / usage of news
content
(visual journalism)
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Data Driven Journalism
Extracting sense out of data to create newsworthy
stories
This implies
Having data
Analysing data
Identifying “sense”, or “sense the news”
Presenting data
Story Publication
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Data Driven Journalism
some insight
Story Publication
Data Harvesting
Which data?
Which formats?
Which time window?
Data Analysis:
Content Analysis
Semantic Analysis
Statistic Analysis
Data Classification:
Ontologies
Automated categorisation
Team
Collaboration
Editorial Process
Detecting the Story
Modelling the Story
Developing the Story
Platform Adaptation
Second Screen
Interactivity
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
One example
Distribution of feminicides in Italy between Jan 2012 and Aug 2015
From Federica Quaglia Msc Thesis - 2015
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Wanting a 100% digital
product
Huge investments in
integrated tech. resources
Data Journalism as a
public service
Visual Journalism and ad
hoc apps
Continuous improvement
of product quality
Perfectly integratesd desks
(journalists & techies)
Data as source for
narration
Notable technical
competence of journalists
Open Source and Open
Data Journalism
Crowdsourcing as a
resource
Investigative Journalism
through data
International collaborations
for data exchange
Expanding scope from
local to international
Experimental editorial
techniques
Editorial objectives Production & org.
approach
4
Best practises
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Our Approach
A combination of the main features detected in best
practise
RAI wants to foster an approach at Data Journalism
orientated to
Increase information trustworthiness, based on harvested,
analysed and verifiable data
Improve user experience through visual presentation of data
Obtain more and more “full digital” products exploitable on a
variety of platforms
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
 Define a workflow model for the editorial staffs,
identifying organisational impacts
 Identify the most appropriate practises and
approaches at Data Journalism among the many
available
 Design and implement a toolbox and an
integrated platform supporting the workflow
Objectives
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Architectural Overview
Project Management
(team management, collaboration)
Story Modelling
(FreeMind)
Sources
(Concept Book)
Data Platform
(CKAN)
Visualization
(DataWrapper)
Multiplatform Publishing
(Es. Wordpress)
«Data Team»
«Customer»
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
The Data Team
A group of people with multidisciplinary skills
Journalistic skills
Technology skills
Agile interaction towards the objectives
Technical members support journalists in finding / harvesting
/processing data
Journalists set the editorial line, develop the story, assess data
relevance
Result is that
Efficiency increased due to sinergy and delegation
Cross-semination gives birth to unexpected explorations
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Modelling the Story
https://webbrain.com/brainpage/brain/434E72FE-3EED-7B13-2F44-561D8F294F28
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Which Information Sources?
Internet
News
Media
News
News
in
RSS
RAI
Programmes
Open Data
National
TV News
(incl. RAI)
Information Domain of Interest
Input Google
RAI CMS
EVN
News
Agen
cies
Other subscr.
Infota
in
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
So what?
No one alone covers everything is needed for data
journalism
Neither Google
We need a wider approach than simply “search on
the web”
Solution:
Integrated and flexible search and analysis of heterogeneous
sources at enterprise level
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Which Information Sources?
Internet
News
Media
News
News
in
RSS
RAI
Programmes
Open Data
National
TV News
(incl. RAI)
Information Domain of Interest
Input Concept Book
Input Google
RAI CMS
EVN
News
Agen
cies
Other subscr.
Infota
in
Thematic Aggregations
(Hyper Media News)
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
RAI Concept Book
Rai Concept Book is a portal for professional
information services that addresses DDJ tasks with
a uniform and holistic approach
Artificial intelligence and advanced statistical tools
are used to automate tasks such as information
extraction and multimedia content analysis
The system allows to define customized search
profiles that are automatically and dynamically
updated with the relevant contents found in the
monitored information sources
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
1. Ingest
• RSS feeds
• Blogs
• DTT streams
• EBU Eurovision News
• Rai Archives
2. Process
• Speech to Text
• Natural Language
Processing
• Document Classification
• Named Entity Recognition
3. Understand
• News aggregation
• Topic identification
• Data warehousing
4. Archive & Access
• Indexing
• Search & Retrieval
• Browsing & filtering
• Recommendation
• Exportation
Processing Pipeline
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Available
profiles
New profile
registration
Personalised Dashboard
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Sources
Open Data
Infotain
News
in
RSS
News
Agen
cies
EVN
National
TV News
(incl. RAI)
Thematic Aggregations
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Filter
& sort
Example: TV Content
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Interactive charts of semantic entities
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Exploring facts
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Localising in space
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
… and time
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
A platform for the Data Value Chain
Comprehensive Knowledge Archive Network (CKAN)
Open Source used by many organisations as a
platform for open data publication
Used in the project as a platform for data journalism
production
Editorial staffs  CKAN organisations
Easy integration with other CKAN-based open data
repositories
Provides updates of data automatically
Extensible via plugins
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Visualisation
Many many options available
Default choice for Datawrapper in this phase of the
project
Open Source
Many graphics & many options
Extensible via plugins
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Integration
Source data selected from the RAI Concept Book
become datasets in CKAN
Datasets harvested by CKAN are classified and
analysed by RAI Concept Book toolbox
Integration between CKAN datasets and
Datawrapper
SSO between CKAN and Concept Book
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Some initial products α
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Conclusions
Finding the “right” approach at Data Journalism
taking into account RAI’s peculiarities
Multidisciplinarity is key
Integration of proprietary and SotA tools works
Still much work to do at all levels
Integration
Workflow
Processes and skills
Very good feedback from RAI top level mgmt
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
(some) Future R&I activity
Tighter integration with Semantic Data
Open data in RDF
Developing new automatic classification
technologies
On more relevant taxonomies than those currently used
Improve model for “concepts”
Evoluting towards more semantic structuring
Does Visual Search play a role in Data Journalism?
Second screen & data journalism
…
FIAT/IFTA World Conference 2016, October 12-15, Warsaw Data Driven Journalism @ RAI
Data is the new soil
- David McCandless -
Sabino Metta
RAI – Radiotelevisione Italiana
Centre for Research and Technological Innovation
sabino.metta@rai.it

data - driven journalism 1

  • 1.
    Data Driven Journalism GiuliaDezi, Giorgio Dimino, Maurizio Mazzoneschi, Alberto Messina, Sabino Metta, Giuseppe Mondelli, Maurizio Montagnuolo RAI – Radiotelevisione Italiana Centre for Research and Technological Innovation FIAT/IFTA World Conference 2016 “Rethink the future of AV” October 12 to October 15, Warsaw
  • 2.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Agenda Definitions Best practises Our approach The data team Tools & architectures Some preliminary results Development ideas Conclusions
  • 3.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Role of Data in Journalism  as origin of the news and of related content (data driven journalism)  as accurate and verifiable description of inherent semantic aspects (precision journalism)  Basic building block for presentation / usage of news content (visual journalism)
  • 4.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Data Driven Journalism Extracting sense out of data to create newsworthy stories This implies Having data Analysing data Identifying “sense”, or “sense the news” Presenting data Story Publication
  • 5.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Data Driven Journalism some insight Story Publication Data Harvesting Which data? Which formats? Which time window? Data Analysis: Content Analysis Semantic Analysis Statistic Analysis Data Classification: Ontologies Automated categorisation Team Collaboration Editorial Process Detecting the Story Modelling the Story Developing the Story Platform Adaptation Second Screen Interactivity
  • 6.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI One example Distribution of feminicides in Italy between Jan 2012 and Aug 2015 From Federica Quaglia Msc Thesis - 2015
  • 7.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Wanting a 100% digital product Huge investments in integrated tech. resources Data Journalism as a public service Visual Journalism and ad hoc apps Continuous improvement of product quality Perfectly integratesd desks (journalists & techies) Data as source for narration Notable technical competence of journalists Open Source and Open Data Journalism Crowdsourcing as a resource Investigative Journalism through data International collaborations for data exchange Expanding scope from local to international Experimental editorial techniques Editorial objectives Production & org. approach 4 Best practises
  • 8.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Our Approach A combination of the main features detected in best practise RAI wants to foster an approach at Data Journalism orientated to Increase information trustworthiness, based on harvested, analysed and verifiable data Improve user experience through visual presentation of data Obtain more and more “full digital” products exploitable on a variety of platforms
  • 9.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI  Define a workflow model for the editorial staffs, identifying organisational impacts  Identify the most appropriate practises and approaches at Data Journalism among the many available  Design and implement a toolbox and an integrated platform supporting the workflow Objectives
  • 10.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Architectural Overview Project Management (team management, collaboration) Story Modelling (FreeMind) Sources (Concept Book) Data Platform (CKAN) Visualization (DataWrapper) Multiplatform Publishing (Es. Wordpress) «Data Team» «Customer»
  • 11.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI The Data Team A group of people with multidisciplinary skills Journalistic skills Technology skills Agile interaction towards the objectives Technical members support journalists in finding / harvesting /processing data Journalists set the editorial line, develop the story, assess data relevance Result is that Efficiency increased due to sinergy and delegation Cross-semination gives birth to unexpected explorations
  • 12.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Modelling the Story https://webbrain.com/brainpage/brain/434E72FE-3EED-7B13-2F44-561D8F294F28
  • 13.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Which Information Sources? Internet News Media News News in RSS RAI Programmes Open Data National TV News (incl. RAI) Information Domain of Interest Input Google RAI CMS EVN News Agen cies Other subscr. Infota in
  • 14.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI So what? No one alone covers everything is needed for data journalism Neither Google We need a wider approach than simply “search on the web” Solution: Integrated and flexible search and analysis of heterogeneous sources at enterprise level
  • 15.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Which Information Sources? Internet News Media News News in RSS RAI Programmes Open Data National TV News (incl. RAI) Information Domain of Interest Input Concept Book Input Google RAI CMS EVN News Agen cies Other subscr. Infota in Thematic Aggregations (Hyper Media News)
  • 16.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI RAI Concept Book Rai Concept Book is a portal for professional information services that addresses DDJ tasks with a uniform and holistic approach Artificial intelligence and advanced statistical tools are used to automate tasks such as information extraction and multimedia content analysis The system allows to define customized search profiles that are automatically and dynamically updated with the relevant contents found in the monitored information sources
  • 17.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI 1. Ingest • RSS feeds • Blogs • DTT streams • EBU Eurovision News • Rai Archives 2. Process • Speech to Text • Natural Language Processing • Document Classification • Named Entity Recognition 3. Understand • News aggregation • Topic identification • Data warehousing 4. Archive & Access • Indexing • Search & Retrieval • Browsing & filtering • Recommendation • Exportation Processing Pipeline
  • 18.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Available profiles New profile registration Personalised Dashboard
  • 19.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Sources Open Data Infotain News in RSS News Agen cies EVN National TV News (incl. RAI) Thematic Aggregations
  • 20.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Filter & sort Example: TV Content
  • 21.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Interactive charts of semantic entities
  • 22.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Exploring facts
  • 23.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Localising in space
  • 24.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI … and time
  • 25.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI A platform for the Data Value Chain Comprehensive Knowledge Archive Network (CKAN) Open Source used by many organisations as a platform for open data publication Used in the project as a platform for data journalism production Editorial staffs  CKAN organisations Easy integration with other CKAN-based open data repositories Provides updates of data automatically Extensible via plugins
  • 26.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI
  • 27.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Visualisation Many many options available Default choice for Datawrapper in this phase of the project Open Source Many graphics & many options Extensible via plugins
  • 28.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Integration Source data selected from the RAI Concept Book become datasets in CKAN Datasets harvested by CKAN are classified and analysed by RAI Concept Book toolbox Integration between CKAN datasets and Datawrapper SSO between CKAN and Concept Book
  • 29.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Some initial products α
  • 30.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Conclusions Finding the “right” approach at Data Journalism taking into account RAI’s peculiarities Multidisciplinarity is key Integration of proprietary and SotA tools works Still much work to do at all levels Integration Workflow Processes and skills Very good feedback from RAI top level mgmt
  • 31.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI (some) Future R&I activity Tighter integration with Semantic Data Open data in RDF Developing new automatic classification technologies On more relevant taxonomies than those currently used Improve model for “concepts” Evoluting towards more semantic structuring Does Visual Search play a role in Data Journalism? Second screen & data journalism …
  • 32.
    FIAT/IFTA World Conference2016, October 12-15, Warsaw Data Driven Journalism @ RAI Data is the new soil - David McCandless - Sabino Metta RAI – Radiotelevisione Italiana Centre for Research and Technological Innovation sabino.metta@rai.it