Machine Learning and Security: The Good, The Bad, and The Ugly - Weneke Lee
Distinguished Cybersecurity Lecture
Thoughts on the interactions between machine learning and security.
The good: We now have more data, more powerful machines and algorithms, and better yet, we don’t need to always manually engineer the features. The ML process is now much more automated and the learned models are more powerful, and this is a positive feedback loop: more data leads to better models, which lead to more deployments, which lead to more data. All security vendors now advertise that they use ML in their products.
The bad: There are more unknowns. In the past, we knew the capabilities and limitations of our security models, including the ML-based models, and understood how they can be evaded. But the state-of-the-art models such as deep neural networks are not as intelligible as classical models such as decision trees. How do we decide to deploy a deep learning- based model for security when we don’t know for sure it is learned correctly? Data poisoning becomes easier. On-line learning and web-based learning use data collected in run-time and often from an open environment. Since such data is often resulted from human actions, it can be intentionally polluted, e.g., in misinformation campaigns. How do we make it harder for attackers to manipulate the training data?
The ugly: Attackers will keep on exploiting the holes in ML, and automate their attacks using ML. Why don’t we just secure ML? This would be no different than trying to secure our programs, and systems, and networks, so we can’t. We have to prepare for ML failures. Ultimately, humans have to be involved. The question is how and when? For example, what information should a ML-based system present to humans and what input can humans provide to the system?
Prof. Wenke Lee is the John P. Imlay Jr. Chair in the School of Computer Science in the College of Computing at Georgia Tech, and the executive director of the Institute for Information Security & Privacy (IISP). His research expertise includes systems and network security, botnet detection and attribution, malware analysis, virtual machine monitoring, mobile systems security, and detection and mitigation of information manipulation on the Internet. He regularly leads large research projects funded by the National Science Foundation (NSF), U.S. Department of Defense, Department of Homeland Security, and private industry. Significant discoveries from his research group have been transferred to industry, and in 2006, doing so enabled Prof. Lee to co-found Damballa, Inc., which focused on detection and mitigation of advanced persistent threats.
Prof. Lee is one of the most prolific and influential security researchers in the world. He has published several dozen, often-cited research papers at top academic conferences, including the ACM Conference on Computer and Communications Security, USENIX Security, IEEE Symposium on Security & Privacy ("Oakland"), and the Network & Distributed System Security (NDSS) Symposium. Prof. Lee’s awards and honors include being elected to an ACM Fellow in 2017 and an IEEE Fellow in 2020, the ACM SIGSAC Outstanding Innovation Award in 2019, the “Internet Defense Prize” awarded by Facebook and USENIX in 2015, an “Outstanding Community Service Award” from the IEEE Technical Committee on Security and Privacy in 2013, a Raytheon Faculty Fellowship in 2005, an NSF Career Award in 2002, as well as best paper awards in the IEEE Symposium on Security and Privacy and the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. He received his Ph.D. in Computer Science from Columbia University in 1999.