Referred to as STAtic Malware-as-Image Network Analysis (STAMINA), the research leverages Intel’s previous work on static malware classification through deep transfer learning and applies it to a real-world dataset from Microsoft to determine its practical value.
The approach is based on the inspection of malware binaries plotted as grayscale images, which has revealed that there are textural and structural similarities between binaries from the same malware families, and differences between different families or between malware and benign software.
In their whitepaper on STAMINA, researchers from Intel (Li Chen and Ravi Sahita) and Microsoft (Jugal Parikh and Marc Marino) argue that the classic malware detection approach that relies on signature matching is becoming less straightforward due to the rapid increase in signatures, while static and dynamic approaches might not be accurate or time-efficient.
STAMINA, the researchers explain, consists of four steps: preprocessing (image conversion), transfer learning, evaluation, and interpretation.
Preprocessing involves pixel conversion (a pixel stream is created: every byte gets a value between 0 and 255, directly corresponding to pixel intensity), reshaping (pixel streams are turned into two dimensions: width and height are determined by the file size after conversion) and resizing (“to 224 or 299 so that the image models trained on ImageNet can be used for fine tuning on the images”).
Next, transfer learning is employed to train a malware classifier for static malware classification. The step is performed on the malware and benign images during the preprocessing step, but the researchers note that, in practice, it would be difficult to train an entire deep neural network from scratch, due to the limitation of datasets.
“What has been done in the computer vision space is that, for specific tasks, models pre-trained on a large number of images are used, and transfer learning is conducted on target tasks,” the researchers note.
During the evaluation step, the researchers look at the accuracy of their method, “false positive rate, precision, recall, F1 score, and area under the receiver operating curve (ROC).” The study was performed on a Microsoft dataset that included 2.2 million malware binary hashes, along with 10 columns of data information (split into 60:20:20 segments for training, validation, and test sets).
“In particular, per feedback from malware analysis practitioners, we also reported recall at 0.1% –10% false positive rate via ROC,” the whitepaper reads.
The tests revealed that STAMINA can achieve a 99.07% accuracy with a false positive rate at 2.58% (precision is at 99.09% and recall at 99.66%).
However, the approach is only effective when applied to small-size applications. For larger-size software, STAMINA is less effective, as the software cannot convert “billions of pixels into JPEG images” and then resize them, making metadata-based methods more advantageous in such circumstances.
“For future work, we would like to evaluate hybrid models of using intermediate representations of the binaries and information extracted from binaries with deep learning approaches –these datasets are expected to be bigger but may provide higher accuracy. We also will continue to explore platform acceleration optimizations for our deep learning models so we can deploy such detection techniques with minimal power and performance impact to the end-user,” the researchers conclude.