Back to projects

cancel culture impact

Analysis of the impact of cancel culture from Kanye West, Seungri, R. Kelly and Marilyn Manson

June 2024
Python
LibreTranslate
Statistics
Data Science

The project leverages multiple datasets gathered using custom Python crawlers:

  • GNews: Crawls news articles using the GNews API.
  • YouTube: Retrieves video comments and statistics via the YouTube crawler.
  • Spotify: Downloads daily metrics on followers and listeners.
  • Billboard Charts: Extracts chart performance data from 2015 to today.

Data preprocessing and sentiment analysis were performed in Python. The workflow includes text translation, stopword removal, lemmatization, and sentiment scoring using the VADER lexicon. Statistical tests (paired t-tests) were then conducted to measure significant changes in sentiment before and after cancellation events.

The following table summarizes whether different data sources indicate a measurable impact on the celebrities' careers:

Impact metrics based on data analysis
Kanye WestSeungriR.KellyMarilyn Manson
YouTube
Articles
Spotify

The repository is organized to facilitate both data collection and analysis:

  • data: Raw and processed datasets (news articles, YouTube comments, Spotify metrics, Billboard charts)
  • src/data_crawlers: Python scripts for crawling various data sources
  • src/analysis: Jupyter notebooks and scripts for data preprocessing, sentiment analysis, and statistical testing
  • src/helpers: Utility functions for file handling and configuration
  • config.json: Configuration file with celebrity search parameters and cancellation dates

The project utilizes several Python scripts and Jupyter notebooks for data crawling, preprocessing, sentiment analysis, and statistical testing. For instance, the gnews_data_crawler.py script fetches news articles and the analyzer_functions.py module provides utilities for text cleaning and analysis.