Inappropriate Comment Scanner

9 min readDec 13, 2022

The purpose of this blog is to explain how to use NLP to create a secure online environment. By detecting harmful comments on social media, you can easily report and remove them. In the long run, this will help people better connect with each other in our increasingly digital world.

This problem is best addressed with a neural network. This is because hard-coded algorithms have difficulty understanding and recognizing human speech, and it is costly for companies to hire humans to identify these comments. Therefore, NLP using neural networks has proven to be the most effective method in the industry and the market size is estimated to reach $26.4 billion by 2024.

BACKGROUND

Before the introduction of deep learning (NLP), companies relied on ineffective hate speech detection methods such as simple bag of word searches. This method “has a high recall rate but a high false positive rate”, resulting in the erroneous deletion of normal speech.

Recently, research has been conducted to identify hate speech using deep learning. In a paper published in August 2019, the multiple-view stacked Support Vector Machine (mSVM) was used to achieve approximately 80% accuracy on data from various social media companies. In another paper published in 2018, using different word embeddings to train a CNN_GRU model and on three different classes which achieves 90% accuracy.

Additionally, many social media companies are investing in ways to eradicate hate speech online. In July 2020, Facebook Canada announced that it would “merge with the Center for Hate, Bias and Extremism at the Ontario Institute of Technology to create what it calls the Global Network Against Hate”.

ILLUSTRATION

Data source, labeling, and processing

Data source

We used data from Kaggle’s Toxic Comment Classification Challenge. It contains comments from Wikipedia editors, individually labeled by human volunteers. Comments fall into his six categories: Toxic, Very Toxic, Obscene, Threatening, Offensive and Hate Identity. For example, if a comment is marked 100100 it is toxic and threatening.

Exploring the Data

Next, we used pandas to explore the data. From Figure 2 we can see that there are over 150,000 comments with a total of comments. However, the average for each class is very low, with most comments being a good

comments. So you have to balance good and bad comments during the processing stage.

Figure 2: the count and mean of the classes as output of .describe()

We also noticed that some classes don’t have enough comments. It has over 15,000 comments flagging as toxic, but less than 500 that are threatening. Because the number of comments in a particular class is so low, the model tends to overfit the model, resulting in poor validation and test set accuracy.

*Figure 3: Number of comments in each class*

Data Augmentation

To solve the above problem, we used the nlpaug package to artificially increase the amount of data in minority classes by replacing synonyms. The number of comments in the majority class has also increased as comments are flagged multiple times.

*Figure 6: Amount of comments in each class before and after augmentation*

Data Cleaning

Cleaned the data with regular expressions, matching patterns in comments and replacing them with cleaner counterparts. We removed all spaces, newlines, abbreviations, etc. Cleaner data leads to more efficient models and higher accuracy.

*Figure 7: The same comments before (top) and after (bottom) cleaning*

Data Processing

More than 10 datasets were created for prototyping and testing models. In the final dataset, we created 6 different sub-datasets (one for each class) and each sub-dataset was split into training, validation, test and overfit datasets. Each subdataset gets all bad comments marked as a particular class, evenly balanced with good comments.

*Figure 8: Visualization of dataset structure*

Architecture

To achieve multi-label classification, the team used 6 CNN_LSTM binary classification models. The input data were first converted to word vectors using GloVe Embedding. We found that bad comments used similar vocabulary phrases. Therefore, a two-layer convolution was used to identify the word pattern in the sentence regardless of its position. To handle record length changes, each convolutional layer has a maxpool layer that provides an output function for each record from each kernel.

The output was then concatenated into vectors and fed to an LSTM network (Long Short Term Memory network), a special type of RNN capable of learning long dependencies. LSTM output decoded using fully connected layers.

The comment labels are multi-hot coded, so this blog focuses on multi-label classification. To ensure that each classifier is fully trained, we adopted the training approach shown in Figure 10. We trained a CNN_LSTM binary classifier using 6 balanced binary datasets, each with one class of data. We combined the outputs of binary classifiers to generate a final multi-hot coded prediction for each comment.

Baseline Model

The baseline model takes the globe embedding vector for each word in each comment, averages those values, and passes this average vector to a fully connected layer to generate an output prediction. Comments are generally short and consist of a few sentences, so mean vectors should be enough for the model to abstract any sense of toxicity.

Quantitative Results

The best performing base model has a training loss of 0.617 and an accuracy of 79.7%, a test loss of 0.619, and an accuracy of 79.4%. This shows that our project idea is feasible. The best models were trained with the hyperparameters in Table 1. The total loss and accuracy are computed as the average loss and accuracy among the 6 binary classifiers. The best model has a training loss of 0.510 and an accuracy of 98.6%, a test loss of 0.528 and an accuracy of 95.3%. As Fig 13 shows, the curves are smooth and show exponential trends. No noticeable sign of overfitting is observed.

To ensure the best training approach, we trained the same CNN_LSTM classifier with the three approaches in Table 2. The first two approaches were trained on a general, imbalanced dataset. In approach 3, models were trained on 6 separate balanced datasets. As the accuracy and loss data show, the third approach used in the best model performs the best. The curves for the other two approaches are irregular and show signs of overfitting.

For multi-label classification problems, we also need to evaluate the number of FPs and the F1 score of the model. Table 3 shows the confusion matrices for the six model classes tested on the balanced binary data set. There are more TPs and TNs than there are FPs and FNs. Moreover, the fact that most classes are less than FP than FN proves that the best models give the desired results. The F1 score for each class is over

90%, indicating high accuracy and recall of the classification model. A comparison of the F1 score with the results presented in the confusion matrix shows that the F1 score is relatively low in the toxicity class. There are also false predictions. This is due to a discrepancy in his labeling of harmful comments. Also, most bad comments in the data set are flagged as toxic, so the classifier cannot recognize features specific to toxic comments.

*Table 3: Confusion Matrices and F1 Scores*

Qualitative Results

*Table 4. Sample Model Outputs, Predictions, and Actual Labels*

To gain more insight into the performance of the model, we selected four representative comments from the dataset shown in Table 4. Case 1 contained only neutral vocabulary and the model was able to make correct predictions. Case 2 includes the words “ugly” and “hate” commonly used in bad comments.

The model picked the word and made a correct prediction. However, in case #3, the model was only

50% accurate. Part of this is due to the inherent subjectivity of the labels in the dataset. Model’s prediction for case #4 is completely wrong. The model recognized the word “freaking” but misunderstood the meaning of the

phrase. This suggests that the model is good at recognizing word patterns but lacks contextual understanding.

Discussion and Learnings

Categorizing comments is a difficult task due to their poor structure and wide variety of words used.

Considering the challenge of multi-label classification and working with large amounts of chaotic and imbalanced data, we are pleased with the test results. Our model may have errors in finding-specific classes, but can successfully tag comments that exhibit some features of toxicity.

Due to the subjective nature of toxicity, some errors may also result from mislabeling of training data. By implementing CNN and LSTM in the architecture and passing the comments verbatim, the final model showed much better performance than the basic model and the model cited in the literature, indicating that the sequence and word pattern extractions are useful for determining toxicity. suggests playing a role in the sentence.

However, our model still has room for improvement and outperforms the top-of-the-line Kaggle model with 98.9% accuracy.

Our results emphasized the importance of maintaining class balance in binary classifiers. After training the model on balanced data, we were able to mitigate many of our main concerns, false positives, as shown in Table 5. After data augmentation, the model was able to generalize from increased training samples and move towards smaller losses, as shown in Table 6. As this extension has proven effective, more sophisticated methods, perhaps by inverse transformation, may yield an even more diverse and robust training set.

*Table 5. Comparison of confusion matrices before and after class balancing*

*Table 6: Loss and accuracy curves of “threat” class before and after data augmentation*

During our data review, we found significant correlations between some classes (for example, 74% between obscene and insulting). As a next step, we can explore how this pattern can be exploited by implementing a chained classifier to improve the model’s accuracy.

*Figure 13: Plot of feature correlation*

*Figure 14: An Imagined Plane of Ethical Reasoning*

Ethical Framework

The driving force behind our project is to promote harmlessness in online communities by identifying harmful comments and taking action on them. This is especially true for those who prefer a safe and productive environment free of negative distractions.

For those who post toxic comments, this limits autonomy by restricting free speech, but it can also limit the diversity of perspectives expressed on

forums. In the long run, this can contribute to a “culture of political correctness” in destructive ways. Therefore, it is important to minimize the amount of false positives your model produces in order to foster constructive conversation.

This model also benefits platform hosts by eliminating the need to manually moderate discussions, saving time and resources. Filtering comments using a machine learning model promotes fairness by treating all comments equally.

References

Inappropriate Comment Scanner

Written by Sahiladhav

Responses (2)