Social Media Made Safer Through UH Group’s Machine Learning Models
In a constantly changing digital world amid varying political, economic and social climates, the need to promote cybersecurity is growing more crucial to ensure future stability.
University of Houston computer science doctoral students Fatima Zahra Qachfar and Bryan Tuck are taking active roles in this effort.
Under the direction of Rakesh Verma, professor of computer science at the UH College of Natural Sciences and Mathematics, the team earned first place in a competition focused on identifying Arabic and Turkish hate speech and offensive language on social media. The competition was part of the Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024), co-located with the 18th Conference for the European Chapter of the Association for Computational Linguistics held in Malta.
This achievement marks the latest accolade for the group.
In fall 2023, Qachfar, Tuck and research scientist Dainis Boumber in Verma’s lab, earned first place in two categories at the first Arabic Natural Language Processing Conference held with the Conference on Empirical Methods in Natural Language Processing in Singapore. The group earned first place in disinformation detection in Arabic social media as a binary problem and in Arabic social media disinformation categorization into multiple classes.
“I’ve been working in hate speech and persuasion techniques to detect phishing, propaganda, and so on, to help protect others,” said Qachfar. “Bryan has also been working in social media hate speech.”
Tackling a Bigger Challenge
The CASE workshop was a more difficult challenge than what the group experienced in the fall. At CASE, they were limited to only three attempts with their submissions each day of the workshop.
“For this event, they gave us a training set for our machine learning-based model,” said Tuck. “You train your model on the training set, and you evaluate it on different data that it has never seen before.” The test was kept private, and teams had no access to it.
During previous competitions, the team received training and evaluation sets and two test sets for their model.
“We had no need to create an evaluation set in previous competitions, but that was not the scenario for CASE 2024,” said Tuck. “This was problematic and created new challenges.”
The team had the opportunity to test their skills in a more authentic setting.
“The organizers wanted to make it more realistic,” said Verma. “In this competition (CASE 2024), you don’t know how you’re going to test your model.”
The team leveraged advanced machine learning techniques and linguistic analysis to tackle hate speech detection in Arabic and Turkish text. Their approach combined deep learning models with carefully curated training data to achieve robust performance in distinguishing between benign and harmful language.
“Facebook, Instagram and X (Twitter) have filters based on machine learning that can detect potentially harmful language in posts,” said Qachfar. “Online groups find ways around these filters by altering spelling and grammar in their messaging.”
Qachfar is hopeful that machine learning will help shore up these defenses.
A Natural Fit for Participation
Qachfar discovered CASE 2024 over the winter break and contacted Tuck and Verma. “Once I found this workshop and realized our potential given our skillset, it seemed like a natural fit for us to participate,” said Qachfar.
For Tuck, participating in the workshop was an opportunity to build on previous success.
“This was right down our alley, especially coming off a first-place win in the fall semester,” said Tuck.
Identifying Potential Problems
Addressing the spread of harmful content has become increasingly critical in fostering inclusive online communities. The team is actively contributing to the creation of safer digital spaces where diverse voices can thrive without fear of discrimination or harassment.
“We are putting the University of Houston on the map when it comes to finding processes to detect and mitigate hate speech and offensive language in social media,” said Verma. “We need to protect people.”
- Chris Guillory, College of Natural Sciences and Mathematics