IT teams use AI and machine learning to improve data insights and aid their businesses, but incorporating these...
technologies into workloads can also increase existing cloud security challenges -- and even create new ones.
In comparison to traditional workloads, AI and machine learning applications use a wider variety of data sources during the development or data science phase. This extends the attack surface required to protect personally identifiable information (PII) and other sensitive data.
A variety of attack vectors -- still emerging and unique to AI applications -- can compromise machine learning apps and result in corrupted models and stolen algorithms.
AI and machine learning data management challenges
Effective use of AI starts with secure data stores, said James Kobielus, lead analyst at SiliconANGLE Wikibon. Enterprises must treat sensitive data responsibly and consider the implications of how that data is used. This requires:
- restrictions on what data streams can be used to load information into data lakes;
- the ability to track who accesses that data and how it's aggregated, transformed and cleansed;
- an understanding of how collected data may be used to build and train AI and machine learning models; and
- awareness of how the data and its machine learning models could be consumed in downstream applications.
To help address cloud security challenges introduced by AI and machine learning, organizations can use scoring mechanisms for PII and sensitive data, according to Justin Richie, data science director at Nerdery, an IT consultancy. Those mechanisms should be implemented into the API management services used to retrieve the data in order to prioritize areas that need additional security.
There haven't been many high-profile attacks on cloud AI apps so far, but researchers have begun to join efforts to find potential vulnerabilities regardless, said Tabish Imran, founding platform engineer at S20.ai, an AI startup.
"Defending against attacks on machine learning workloads is tricky because there is no standard way of testing a machine learning model for security vulnerabilities yet, as compared to web apps, mobile apps or cloud infrastructure," Imran said.
Known attacks on AI and machine learning systems can be broadly classified into three types:
- Data poisoning attacks find ways to feed data into machine learning classifiers to train for goals that are against an enterprise's interests. This kind of attack could occur when a model uses data captured from public sources or end users.
- Model stealing attacks seek out insecure model storage or transfer mechanisms. Encrypting models are now becoming a standard practice when a proprietary model must be deployed on the edge or on premises, Imran said.
Another way to steal a model is through reverse engineering. Hackers repeatedly query the model and gather responses for a large set of inputs. This could enable an attacker to recreate the model without actually stealing it.
- Adversarial input attacks morph or augment data so the classifier fails to correctly assign the input. These types of attacks can range from tricking a spam filter into believing spam mail is legitimate to fooling an object detection security system into misclassifying a rifle as an umbrella.
"While you can safely bet that using something like Amazon SageMaker should provide you a reasonable degree of security against basic vulnerabilities, such as insecure model storage, these APIs can be just as vulnerable to adversarial attacks," Imran said.
Secure the data science pipeline
Organizations use an iterative process to train and customize AI workloads. Data scientists experiment with different machine learning models to explore relationships between data sets. On the deployment side, data engineers operationalize these workloads with a range of tools, which can introduce security vulnerabilities.
However, it is important for companies to encourage AI scientists to train and deploy models without worrying about cloud security threats. "It is not realistic to assume that all AI scientists should understand and employ cloud security best practices," said Sewook Wee, director of data engineering at Trulia, a real estate service.
Trulia built security best practices into its machine learning platform to make it easier for AI scientists to develop product-ready models. Trulia also implemented security guardrails to automatically check its cloud infrastructure and alert application owners when an issue is detected.
Multiple researchers have conducted proof-of-concept attacks on various commercial machine learning services. For example, a group of researchers from University of Washington successfully attacked Google's Cloud Video Intelligence API. The researchers inserted adversarial images into the video that were barely visible, which caused all the labels returned by the API to be about the adversarial images.
If an attacker can introduce nearly invisible alterations that can fool AI-powered classification tools, it is difficult to trust that the tool will do its job effectively, Kobielus said.
Safeguard cloud AI services
To protect against these cloud security challenges, risk assessments must become a standard practice at the start of development, and adversarial examples should be generated as a standard activity in the AI training pipeline, Kobielus said.
In addition, data scientists should rely on a range of AI algorithms in order to detect adversarial examples and reuse adversarial defense knowledge through transfer learning approaches, he added.
Organizations can use open source tools, such as IBM's Adversarial Robustness Toolkit and TensorFlow CleverHans toolkit, to test a model's strength and secure it against known adversarial attacks.
Platforms such as Amazon Rekognition or Lex have some built-in security safeguards, but IT teams must secure the calls to these services appropriately, said Torsten Volk, managing research director at Enterprise Management Associates. If attackers figure out how to eavesdrop on calls to a service like Rekognition, they can learn about a company's business processes. Attackers could even pose as an AI service and send back false or misleading information to an application.
Volk recommended implementing a centrally governed, federated data plane that also ties into services such as SageMaker, Azure Machine Learning, Google AI or IBM Watson. Such offerings provide an auditing trail for where, when and how data is used. Then, in the event one of these services is compromised, the enterprise is in a better position to undertake a remediation strategy that involves tuning the data pipelines and configurations that feed the service.