Ever since GDPR entered our world, there have been many discussions on personal data used for purposes other than the actual application to which the user has signed up for. This includes testing purposes, where many companies still rely on production data to (regression) test some of their systems, as this is believed to be the only acceptable way to test real-life scenarios. Lately, regulators have increasingly focused on this topic. This article explains some requirements for test data and leading practices to deal with them.
Many requirements govern and influence which test data to use for adequate testing of software changes before going live. On the one hand, testers want data that is close to existing production data so that they can more easily and more realistically determine test scenarios resembling real-life scenarios. On the other hand, regulators and IT Security professionals want to limit the amount of actual data from production to be used for testing, as many test environments are typically not as secure as production environments, and this usage also makes the “right to be forgotten” much more difficult to implement.
Therefore, quite a few regulatory requirements expand on the GDPR definitions in articles 13, 17 and 32 with details, especially for financial firms. In Germany, these are covered in the «Kreditwesengesetz» KWG and further detailed in industry-specific requirements laid out in MaRisk, BAIT (or VAIT, KAIT, etc.), and company-specific policies.
Let’s take GDPR article 32 as an example. In summary, this states that the data controller and processor shall implement appropriate technical and organizational measures to ensure a level of security appropriate to the risk. For banks, this is further elaborated in MARisk AT 7.2 where the need to safeguard the security, integrity, availability, and authenticity of data is being defined, and in BAIT 7.11 where the level of data protection is specifically mentioned for testing purposes. However, these requirements do not cover all details; every organization needs to determine (aided by their data security and IT security officers) what “appropriate” means to them.
How can you approach compliance with these requirements, especially in a legacy application environment?
A starting point needs to be a systematic risk analysis of the as-is situation. Which kind of test data is being used in which system and environment? Use the (hopefully existing) business impact analysis to determine a risk score for each application as it pertains to data. Take specific note of any personal data being used – where it comes from, whether it is being stored or manipulated locally, and who is in charge of updating test environments with this data. Then, turn to the testers themselves – how they document tests, how they determine which test data is needed for their testing, and how integrated their scenarios are.
Using the assessment results as guidance, you can determine your further course of action. For high-risk applications (those that store personal data or where high availability requirements have been defined) you will need to look into data masking techniques or the usage of synthetic test data to perform your test cases. While many tools exist for this job, tool selection should be less important than:
a. Understanding how testing is typically being performed and
b. Mapping out the data flow between different systems to ensure that you maintain data integrity between applications (and therefore tests) when masking data.
A real-life example
As a real-life example, a German Bank first went through all applications classified as high-risk and mission-critical (KRITIS) for this institution. They used interviews with testers and operations to understand the dependencies of these applications to others during testing and identified common data sources for personal data. Using this knowledge, masking algorithms were developed in parallel to selecting an appropriate tool that fulfilled both functional (use cases such as data masking, archiving, synthetic data generation) and technical (cloud storage, in-place data masking, support for various application platforms) requirements.
A pilot was conducted, replacing the previously used custom-developed scripts for data masking and validating the masked data with typical test scenarios. Business testers had to be trained on using the masked data since, in many cases, they were accustomed to finding “their” clients in test environments for testing purposes. A central test data management team was also set up in order to guide project managers and testers when it comes to identifying data necessary for testing and populating the test environments with this masked data. The assessment results continue to serve as a roadmap for further improvements, expanding the usage of masked test data throughout the bank.
SOURCE: this article is a part of Software Survey (Release 1/2023): Survey of Tools for Software Engineering, provided by United Innovations.
The United Innovations (UI) platform is a subsidiary of GFFT e.V., a non-profit society dedicated to research transfer. Its primary objective is to drive innovation in Germany, Europe, and beyond. With its extensive network, the platform is committed to achieving this goal.
United Innovation supports the innovation process in each topic area with the same offerings: the technology database TechL©, the surveys, the awards for evaluating new technical offerings, startups, and scientific prototypes, many events, and proofs of concepts and launch projects.