Research Data Leeds Repository

Headlines data for social media popularity prediction

Citation

Piotrkowicz, Alicja (2017) Headlines data for social media popularity prediction. University of Leeds. [Dataset] https://doi.org/10.5518/174

This item is part of the Social Media Popularity Prediction Using Headlines collection.

Dataset description

This dataset is part of a larger project on using headlines to predict the social media popularity of news articles. The dataset consists of two headlines corpora -- The Guardian and New York Times -- collected in 2014 using news outlet APIs. Each corpus includes a unique headline identifier (to enable recreating the corpus by querying the relevant API), the extracted features (news values, style, metadata), and the corresponding popularity on Twitter and Facebook.

Keywords:

headlines, news values, style, social media popularity, prediction from text

Subjects:

I000 - Computer sciences > I400 - Artificial intelligence > I410 - Speech & natural language processing

Divisions:

Faculty of Engineering and Physical Sciences > School of Computing

Related resources:

Location	Type
https://aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/view/15775	Publication
https://eprints.whiterose.ac.uk/115200/	Publication
https://ojs.aaai.org/index.php/ICWSM/article/view/14951/14801	Publication
https://eprints.whiterose.ac.uk/115024/	Publication
https://etheses.whiterose.ac.uk/20430/	Ethesis

License:

Creative Commons Attribution 4.0 International (CC BY 4.0)

Date deposited:

11 May 2017 09:50

URI:

https://archive.researchdata.leeds.ac.uk/id/eprint/147

Additional details

Creators or authors:

Creators	ORCID	Other ID	Email	Primary affiliation	Primary affiliation ID	Primary affiliation ID type	Secondary affiliation	Secondary affiliation ID	Secondary affiliation ID type
Piotrkowicz, Alicja	https://orcid.org/0000-0002-7723-699X			University of Leeds	https://ror.org/024mrxd33	ror

Type of data:

Dataset

Contributors:

Contribution	Name	ORCID	Other ID	Email	Primary affiliation	Primary affiliation ID	Primary affiliation ID type	Secondary affiliation	Secondary affiliation ID	Secondary affiliation ID type
Supervisor	Dimitrova, Vania	https://orcid.org/0000-0002-7001-0891			University of Leeds	https://ror.org/024mrxd33	ror

Research funders:

EPSRC

Grant numbers:

Doctoral Training Grant

Publication date:

2017

Resource language:

English

Metadata language:

English

Publisher:

University of Leeds

Contact email:

scap@leeds.ac.uk

Last Modified:

13 Dec 2024 10:04

Files

Documentation

Dataset description [120kB]

Dataset description [120kB]

Data

Feature vectors for the main model (The Guardian) [2MB]

Feature vectors for the main model (New York Times) [508kB]

Popularity measures (The Guardian) [663kB]

Popularity measures (New York Times) [94kB]

Feature vectors for the baseline models (The Guardian) [3MB]

Feature vectors for the baseline models (New York Times) [629kB]