What Are the Best Practices For Test Data Management in Multi-Cloud AppDev?

by January 24, 2020 0 comments



As the technology is evolving at a lightning speed the pace of application development has also accelerated in the innovative landscape. Amid this, a number of companies are moving towards the adoption of cloud for flexible capacity and enhanced operational efficiency to make products arrive at the market as soon as possible.

The teams concerned with application development (appdev), leverage such attributes of the cloud for accelerating agility. Moreover, experiences with cloud-based lower-level environments facilitate with an opportunity to re-architect IT processes and set up security practices while increasing knowledge and confidence when it’s time to migrate production workloads.

However, the models for appdev in the cloud vary. Despite the teams working in single, hybrid, or multi-cloud models, DevOps practices are often evaluated to avoid processes that add complexity and overhead, and the same should be true for the data pipeline that feeds the release train.

Moreover, the emerging practice of DataOps focuses on the fast and secure movement of data—think DevOps for data. DataOps has modernized test data management and eliminates long-standing wait-states that limit release velocity. As noted by Delphix, there are some best practices for DataOps to increase the efficiency of CI/CD workflows within and across clouds and build better software faster.

Firstly, the teams for DevOps can spin up and down cloud-based test environments at a fast pace as they iterate on new code. Also, the validation of every change increases the speed of integration into master, and feature branches can be quickly retired. But agility is lost when test data delivery doesn’t match this optimized model.

Fast-moving release trains get stuck waiting on serial ticketing and manual operations to deliver data into non-production environments. Moreover, according to research, around 80 percent of enterprises in North America take four days or more to provision test data. Automating data delivery into the CI/CD toolchain breaks the data bottleneck, so continuous integration can truly scale.

Such codification of data in a multi-cloud model must remain cloud-agnostic to enhance portability. Creating shareable code eliminates the need to tweak logic for cross-cloud integration testing and deployments.

Secondly, DevOps teams leverage small batch sizes to increase agility and maintain a tighter feedback loop. Shifting left keeps defects from moving down the pipeline, where they get harder and more expensive to triage. To find issues sooner in the SDLC, the spectrum of test environments used should closely simulate production, including data.

Lenore Adam, Director of Product Marketing at Delphix, quotes that “out of convenience, developers often use synthetic data or subsets for testing—but that significantly weakens results. Data should reflect the production instance to ensure comprehensive test coverage and improve software quality.”

Thirdly, destructive testing requires datasets to be returned to the original state, so tests can resume. Given the frequency that this occurs in test-driven development, delays in restoration create yet another bottleneck for the CI/CD pipeline. Treating the data like code solves this problem. Moreover, version controlling data creates a reference point in time, so data can be automatically rolled back to the original state during testing or when reproducing errors at a later date. Linking the state of the test database to specific application changes increases the flow of planned work because data become as agile as the code.

Additionally, enterprises leverage diverse data sources customized according to the various applications that are located in an equally diverse set of environments, on-premise and across clouds. Adam notes, siloed procedures for provisioning test data have caused organizations to become “data blind”—meaning they don’t have a clear picture of what data they have and who has access to it.

In this regard, centralizing governance brings visibility and standardized control of who has access to what data, when, and for how long—no matter where environments are located. As with automated data delivery, permissioning of test data should not include cloud-specific logic. Cloud-agnostic controls result in policy-based processes that span cloud providers, and infrastructure-wide administration creates traceability for audits and reporting.

Lastly, non-production environments are often less secure, simply for reasons of cost and convenience, which heightens the risk of sensitive data exposure. A centralized strategy to identify and safeguard sensitive data is essential to create a consistent line of defense across clouds. Also, protecting PII information requires data obfuscation prior to distribution into lower-level environments. The method for anonymizing data should both remove sensitive information as well as ensure data still behaves like production data for testing purposes.

According to Adam, encryption is common, but this removes logical relationships between database tables which in turn limits test coverage.

On the contrary, data masking replaces real data with fictitious but realistic data to maintain its referential integrity during testing. Masked data is not useful to a hacker and ensures non-production environments maintain compliance with privacy laws, such as the GDPR and CCPA.

No Comments so far

Jump into a conversation

No Comments Yet!

You can be the one to start a conversation.

Your data will be safe!Your e-mail address will not be published. Also other data will not be shared with third person.