The aim of this project is to create a dataset of painters from sources such as WikiArt and Art500k, combining features, substituting missing data of painters via the Wikipedia API and make corrections/additions both automated and manually. After finding mappings between painters in the two sources + Wikidata, the dataset includes around 10000 painters with many attributes.
Currently, the dataset includes 29 attributes:
The dataset is intended to be used for various purposes, including data analysis, machine learning, and visualization projects.
One long-term goal would be to create a JSON file that contains all combined hierarchically. A level in the structure could be art movement, inside it, are artists with some base bio data, an even lower layer could be the paintings of the painter (even better could be eras of painters in their substructure, and inside them the paintings).
We have created multiple networks of painters (based on being at the same places at the same time + nationality, additionally style similarity, or who influenced whom networks) in another project (see: ArtProject). A network of styles and movements were also created.
In any case, the final ("compiled") dataset is always stored in the artists.csv file (raw file here: raw, often this is better import / look at as it doesn't have the commit ID in the URL so this gives back always the freshest version).