Case Cardiff University

Recognising social media hate speech on cloud-based analytics.

Case Cardiff University

Recognising social media hate speech on cloud-based analytics.

Gofore works with Cardiff University to provide scalable, cloud-based social media analytics tools for the officials and organisations fighting against the spread of online hate speech.

Cardiff University, originally established in 1893, and with its current over 30,000 enrolled students, is among the ten biggest universities in the UK. Their Social Data Science Lab conducts social data analysis for research, policy, and practise – and develops tools to serve the purpose.

Following the 2016 UK Brexit vote and the 2017 terror attacks, Britain saw a surge in hate crimes. Principal investigator, Professor Matthew Williams, said: “Our analysis shows that social media posts from the press, hate crime charities and police gain significant traction in the immediate aftermath of terrorist incidents. As a result, they have an opportunity to engage in counter-speech messaging, such as dispelling rumours and challenging stereotypes, to support victims and stop hate speech from spreading.”

Cardiff University had developed the first version of Hate Speech Dashboard to provide aggregate trends of hate speech. It had its limitations, especially on the scalability of the service. Also, overall architecture and user interface required an upgrade. The objective was to transform the solution fully onto the public cloud infrastructure to provide efficiency in operation and required scalability for the future. In January 2019, through public procurement, Gofore was awarded a contract to take over the development of the tool. The project has been funded by Economic and Social Research Council (ESRC).

From focused piloting to expanded utilisation

At first, the pilot version of the tool was built on top of Amazon Web Services. The whole frontend user interface, including user research with first anticipated pilot users, required data collection operations and visualisations were designed and developed. In the backend, serverless technology was used to create a scalable data flow to ingest and classify Tweets in real-time. After the classification process, metadata was extracted and aggregated to provide useful statistics. A web service was created to provide the data and user authentication for the end users. The hate speech classifiers, developed by Cardiff University, were integrated into the serverless architecture. The tool was provided to the UK National Online Hate Crime Hub for pilot use.

In the second phase, the system was prepared for expansion and scalability. In addition to Online Hate Crime Hub, three new organisations’ expectations and requirements towards the tool were collected through comprehensive user research. The backend was architected to accommodate easy creation of multiple parallel cloud environments with Terraform, enabling a dedicated instance to be provisioned for each new organisation. The user interface was updated for more fluent user experience and extended with new features.

The system uses natural language processing and machine learning trained classifier models to recognise the hate speech in various categories: Extreme / Far Right, Anti-Semitism, Anti-Muslim, Anti-LGBT etc. With recent developments, the addition and integration of new classifiers based on the users’ requirements, has been made more straightforward.

The responsibility of system operation and maintenance has been in Gofore Service Centre throughout.

Quick recognition and reaction into online hate speech

Already with the pilot tool UK National Online Hate Crime Hub was able to follow close to real-time the bursts of hate speech appearing in social media. They were able to spot the topics of hate and recognise associated hashtags quickly, which provided improved capabilities to monitor and detect tensions – and react to them rapidly using counter-speech.

With the second phase extended and scalable solution, various organisations fighting against discrimination, inequality and hate speech spreading in social media, will have better tools to focus their efforts.

Professor Matthew Williams summarises the achievements: “A volatile technical environment made our set of requirements for HateLab complex and changing, creating a challenging development process. Gofore was up to the challenge and delivered on all the requirements in good time. Clear lines of communication made the whole process run smoothly.”

Technologies involved

  • Twitter Enterprise API, Twitter free API v2, Pushshift API for Reddit etc.
  • AWS Lambdas, Firehose, Elastic Container Service (ECS), REST API
  • Docker, Terraform, Infrastructure as Code (IaC)
  • Machine Learning, Natural Language Processing (NLP)

More information

Project highlights

Check icon

Scalable usage of cloud services

Efficient use of native AWS services. Various data sources. Automated creation of dedicated instances for additional organisations.
Check icon

Real-time data analysis and classification

Continuous data inflow. Visualisation of hate speech within the granularity of one minute and view of longer-term trends.
Check icon

Cohesion over tension

Quick reaction on hate speech. Objective and fact-based counter-speech. Mitigation of toxic reactions.

"A volatile technical environment made our set of requirements for HateLab complex and changing, creating a challenging development process. Gofore was up to the challenge and delivered on all the requirements in good time. Clear lines of communication made the whole process run smoothly."

Matthew Williams

Professor

Cardiff University

Interested? Let's talk more.

Jussi Puustinen

Cloud & Continuous Services

  • This field is for validation purposes and should be left unchanged.