Abstract: The wide adoption of deep neural networks (DNNs) in real-world applications raises increasing security concerns. Neural Trojans em- bedded in pre-trained neural networks are a harmful attack against the DNN model supply chain. They generate false outputs when certain stealthy triggers appear in the inputs. While data-poisoning attacks have been well studied in the literature, code-poisoning and model-poisoning backdoors only start to attract attention until re- cently. We present a novel model-poisoning neural Trojan, namely LoneNeuron, which responds to feature-domain patterns that trans- form into invisible, sample-specific, and polymorphic pixel-domain watermarks. With high attack specificity, LoneNeuron achieves a 100% attack success rate, while not affecting the main task per- formance. With LoneNeuron’s unique watermark polymorphism property, the same feature-domain trigger is resolved to multiple watermarks in the pixel domain, which further improves watermark randomness, stealthiness, and resistance against Trojan detection. Extensive experiments show that LoneNeuron could escape state- of-the-art Trojan detectors. LoneNeuron is also the first effective backdoor attack against vision transformers (ViTs).
For Reference: Liu Z, Li F, Li Z, et al. LoneNeuron: a Highly-Effective Feature-Domain Neural Trojan Using Invisible and Polymorphic Watermarks[C]//Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 2022: 2129-2143.