More Semantics More Robust: Improving Android Malware Classifiers

Wei Chen, David Aspinall, Andrew D. Gordon, Charles Sutton, Igor Muttik

Automatic malware classifiers often perform badly on the detection of new malware, i.e., their robustness is poor. We study the machine-learning-based mobile malware classifiers and reveal one reason: the input features used by these classifiers can’t capture general behavioural patterns of malware instances. We extract the best-performing syntax-based features like permissions and API calls, and some semantics-based features like happen-befores and unwanted behaviours, and train classifiers using popular supervised and semi-supervised learning methods. By comparing their classification performance on industrial datasets collected across several years, we demonstrate that using semantics-based features can dramatically improve robustness of malware classifiers.

This paper provides a deeper understanding of how machine learning techniques fail and succeed at classifying Android malware. Previously developed mobile malware classifiers seem to rely on features that vary widely over time, making them impractical. The main goals of this paper include studying how malware changes over time, identifying invariant features that allow a machine learning classifier to retain value, and allowing for detection of malware that was not seen during the training phase. To support the problem motivation, the paper demonstrates that while syntax-based classifiers work well on training and validation malware collected around the same time, they perform badly when the malware being classified is collected at a later time than the training set.

The paper suggests the use of semantic features of mobile apps to retain classifier value over time, building on the intuition that certain semantic attributes of mobile malware are invariant. Experiments are provided to verify that the incorporation of semantic features can significantly improve the performance of Android malware classification.

The reviewers particularly appreciated the intuitive explanations of an important aspect of how machine learning algorithms succeed and fail in malware classification. The experimental approach is very detailed and lends itself to repeatability and follow-on studies. The authors also provide strong insight into semantic feature selection and its associated issues. By revisiting previous measurement studies, the paper has provided an important challenge to the research community in regard to understanding what machine learning algorithms are actually doing and how they can be improved.