Temporal Preference Data Collection
SOI was designed by PREFLIB.ORG and stands for incomplete strict orders. For a definition look here
TSOI (temporal incomplete strict order) is based on the SOI format but deanonymized i.e. the voter is named in some form before its own ranking. This format was designed to allow using voting data over a timeframe. Therefore each TSOI file in a data set uses ids for the alternatives that are consitant over all tsoi in the data set. This means the numbering is not neccessary from 1 to number of alternatives but can skip many numbers.
Each TSOI file is structured the following:
It starts with how many alternatives are available and then listing the alternatives/candidates
Number of alternatives
1, first alternative name
2, second alternative name
5, fifth alternative name (of the full data set here only the third)
...
Afterwards some data about the vote is given like how many voters, the total weight of the voters combined and the number of unique orders. In most cases those will all be the same number, especially unique orders was decided to take the same value as number of voters, as each voter will get its own row.
Number of voters, sum of vote count, number of unique orders
The votes are structured as following, but as each voter has it's own row the value for "count" will be 1 unless the voter has a higher weight:
voter: count, list of preferences
Example:
mr.x: 1, 2, 1
...
Every not listed alternative by a voter means it is last ranked for them.
It is also allowed to give a weight to each alternative in a ranking e.g.:
mr.y: 1, 5[400], 1[300], 2[5]
This weight needs to be consistent with the ordering of the alternatives from highest weight on first position to lowest weight on last position.
A not provided but possible file format would be TTOI (temporal incopmplete tied order). The difference is that similar to toi from PREFLIB.ORG tied alternatives are written within "{}" e.g.:
mr.z: 1, {5,2}[100], 1[10]
As additional info: Every file in all the collections has a date in their name to allow sorting from newest to oldest or the other way.
It starts with how many alternatives are available and then listing the alternatives/candidates
Number of alternatives
1, first alternative name
2, second alternative name
5, fifth alternative name (of the full data set here only the third)
...
Afterwards some data about the vote is given like how many voters, the total weight of the voters combined and the number of unique orders. In most cases those will all be the same number, especially unique orders was decided to take the same value as number of voters, as each voter will get its own row.
Number of voters, sum of vote count, number of unique orders
The votes are structured as following, but as each voter has it's own row the value for "count" will be 1 unless the voter has a higher weight:
voter: count, list of preferences
Example:
mr.x: 1, 2, 1
...
Every not listed alternative by a voter means it is last ranked for them.
It is also allowed to give a weight to each alternative in a ranking e.g.:
mr.y: 1, 5[400], 1[300], 2[5]
This weight needs to be consistent with the ordering of the alternatives from highest weight on first position to lowest weight on last position.
A not provided but possible file format would be TTOI (temporal incopmplete tied order). The difference is that similar to toi from PREFLIB.ORG tied alternatives are written within "{}" e.g.:
mr.z: 1, {5,2}[100], 1[10]
Eurovision Song Contest
Represents the jury voting data for the finals of the yearly Eurovision Song Contest. These are in general top-10 votes.Every file represents another year with different candidates, but it was decided to only consider the country the contestant represents to allow for a more consistent alternative set.
This data was collected from https://data.world/datagraver/eurovision-song-contest-scores-1975-2019
The data set starts in the year 1975 and ends in 2019 for a total of 45 data points.
IPhone App Store Charts
These data sets represent the rankings of the Apple App Store for IPhone over a timespan of about 2 Month (2019-03-13 until 2019-05-15/ 62 data points). The charts are per region and each region has up to their top-200 apps as votes.They were collected through https://appfollow.io/
IPhone Game Charts
Downloads:
Top Paid Games
tsoi (1.4 MB) soi (1.4 MB)Top Free Games
tsoi (1.1 MB) soi (1.1 MB)Top Grossing Games
tsoi (1.2 MB) soi (1.2 MB)IPhone News Charts
Downloads:
Top Paid News Apps
tsoi (640 KB) soi (516 KB)Top Free News Apps
tsoi (1.9 MB) soi (1.9 MB)Top Grossing News Apps
tsoi (1.4 MB) soi (1.4 MB)Spotify Charts
Spotify provides their charts on https://spotifycharts.com/regional.They are categorised by daily or weekly and both of them for streaming number on their service or how viral the songs are online.
All of the data sets start with data from around the beginning of 2017 and up to the end of November 2019. This means the daily data sets have overr 1000 data points and the weekly ones around 150 each.
Viral Charts
These are top-50 charts (some votes have fewer than 50 alternatives ranked).Streaming Charts
These are top-200 charts (some votes have less than 200 alternatives ranked). For these data sets streaming numbers are available, but as they have no real place in SOI they are provided as weights in the TSOI filesAll the above tsoi as one download
This collection of all data sets is used for https://github.com/martinlackner/perpetual to allow experiments on real data.
Download:
tsoi (259 MB)
tsoi (259 MB)