As confidential information is made available to multiple users for data analysis in the cloud, security controls become vital. NoSQL databases are popular choices for working with large data sets, but early versions of NoSQL data stores were light on data security features. Now, big data administrators can leverage the benefits of NoSQL and still maintain some control over who can access subsets of data in the cloud.
IT admins and cloud specialists can implement a variety of fine-grained NoSQL access controls that vary according to the database's data model. Before you unintentionally disclose potentially personal data, consider NoSQL's built-in data access controls.
There are three different ways to secure NoSQL data stores, each with its own benefits and downfalls: Accumulo's cell-based access controls, Amazon DynamoDB's use of AWS' Identity Access Management (IAM) policies and MarkLogic's compartment controls and execute privileges.
Accumulo data store
Accumulo is a distributed, key-value data store based on Google's Big Table. This open source option was created by the NSA and released in 2011. Accumulo is an Apache project and runs on the Hadoop environment, with additional features not found in Big Table, including cell-based access controls.
Accumulo keys include a visibility attribute that specifies security labels, such as admin, finance or manager. Because each key is associated with a single value, the equivalent of a row in a relational table, key-based access controls limit the set of rows that a user can query or manipulate. Users are then assigned authorizations that specify certain security labels, which can be combined in logical expressions to create access controls as needed. For example, a manager in the finance department would be assigned both the "manager" and "finance" labels.
Application and database administrators determine the set of labels and how they are applied based on the organization's security policies.
User authorizations are stored by trusted third-party authentication servers. When an application executes an operation -- such as a query or an update -- the user authorizations are retrieved from the third party and passed to Accumulo. Thus, when an operation is executed, the set of user authorizations is compared to data cells, and the set of accessible rows is limited for that particular user.
Amazon Dynamo Database (DB) is one of the fastest growing Amazon Web Services (AWS) offerings. DynamoDB is a key-value data store service that provides automated scalability and provisioned IOPS. Amazon DynamoDB is a good option for developers and application managers who prefer a hosted service to administering their own NoSQL database. It may be especially appealing for AWS users who have invested time and resources establishing IAM policies, because these can be used to finely control access to data stored in Amazon DynamoDB.
To maintain fine-grained access control in Amazon DynamoDB, admins must specify conditions in an IAM policy. Conditions allow or deny access to particular items and attributes in the key-value data store. This model limits access to particular values or rows, such as data associated with a particular customer account so customers can only see their data. It also allows application administrators to define rules for access to particular attributes. For example, a policy may specify that only employees, and not customers, can view a category attribute in the database.
Document-based NoSQL databases, such as MarkLogic, can extend role-based access controls to group roles and documents into compartments. Compartments allow for access control checks and standard role-based controls. If a document has a compartment assigned to it, only users with that compartment assigned to them can access the document.
MarkLogic also provides access control over the execution of operations. The database includes a set of predefined execute privileges that deal with data management, security and other administration operations. Database administrators can control the ability to execute queries by creating execute privileges. An execute privilege is included in the definition of a query and also assigned to roles. Only users with roles that hold the necessary execute privilege will be able to execute queries.
About the author:
Dan Sullivan holds a Master of Science degree and is an author, systems architect and consultant with more than 20 years of IT experience. He has had engagements in advanced analytics, systems architecture, database design, enterprise security and business intelligence, and worked in a broad range of industries, including financial services, manufacturing, pharmaceuticals, software development, government, retail and education. Dan has written extensively about topics that range from data warehousing, cloud computing and advanced analytics to security management, collaboration and text mining.