Short: Detecting SMS Spam in the Age of Legitimate Bulk Messaging

Bradley Reaves, Logan Blue, Dave Tian, Patrick Traynor, Kevin R. B. Butler

Text messaging is used by more people around the world than any other communications technology. As such, it presents a desirable medium for spammers. While this problem has been studied by many researchers over the years, the recent increase in legitimate bulk traffic (e.g., account verification, 2FA, etc.) has dramatically changed the mix of traffic seen in this space, reducing the effectiveness of previous spam classification efforts. This paper demonstrates the performance degradation of those detectors when used on a large-scale corpus of text messages containing both bulk and spam messages. Against our labeled dataset of text messages collected over 14 months, the precision and recall of past classifiers fall to 23.8% and 61.3% respectively. However, using our classification techniques and labeled clusters, precision and recall rise to 100% and 96.8%. We not only show that our collected dataset helps to correct many of the overtraining errors seen in previous studies, but also present insights into a number of current SMS spam campaigns.

Review:
This short paper tackles several interesting challenges stemming from the rise of bulk text messages, which are now also used for a variety of non-cellular services such as two-factor authentication. This change limits the usefulness of traditional approaches to characterizing SMS spam. The authors present a novel approach that remarkably improves precision and recall compared to previously proposed classifiers. To empirically evaluate the proposed approach, the authors crawl public SMS gateways and create a novel, larger dataset for SMS, which they have labeled and publicly released. In the process of creating this dataset, the authors also show that existing SMS spam corpora do not sufficiently reflect the prevalence of bulk messages in modern SMS communications.

The reviewers appreciated the experimental work showing that existing spam detection methods do not work well after rapid increase of legitimate bulk traffic, e.g. two factor authentication messages. The authors also managed to fix the problem by significantly increasing the accuracy of these methods. The reviewers were also happy to see that a useful dataset is released for future work. Finally, some of the reviewers had concerns about the ethical implications of collecting messages from public SMS gateways, which may inadvertently contain private information. However, the final consensus of the program committee is that the authors properly handled user privacy in the publicly released dataset.