The Client is an Ohio-based, innovative provider of equipment and services for data centers, with a portfolio of power, cooling and IT infrastructure solutions and services that extends from the cloud to the edge of the network. Headquartered in Columbus, Ohio, this company has twenty thousand employees and more than twenty five manufacturing and assembly facilities around the world. The company has regional headquarters around the world. The company is listed in the New York Stock Exchange (NYSE).
The scope of this project is to implement the PII Identification for structured and unstructured data across thier enterprise systems. In this GDPR Audit and assessment project over three thousand five hundred applications and databases are in scope to identify the PII data to complete GDPR assessment across all domains Like: HRIS, PLM, Field Services, Sales etc. across the globe. The identified sensitive data needs review and approval from the data citizens. The approved sensitive attributes need to be identified in the ChainSys metadata management application. The sensitive data need to be remediated by data masking to avoid data breaches and stay compliant. The data Governance procedures and policies need to be established for continuous PII monitoring and remediation.
GDPR is equivalent to a US Federal Law, and GDPR non-compliance can lead to fines of up to €20 million or 4% of annual global turnover. The Client is at the high risk of significant fines, data breach, brand reputation, and potential loss of customers if they do not mitigate the risk of adhering to GDPR compliance. The client needs to complete the PII data assessment in a short period of time and they faced the following major challenges.
1. 3500 + Databases need to be scanned and identify the PI Data Categories.
2. Complex Business Rules.
3. Scanning of Password Protected Files.
4. Power-BI Integration to build the executive dashboards.
The Technical Landscape at the client consists of 30+ heterogeneous applications and are integrated with a Big Data environment which consists of three Cloudera Hadoop clusters – Development, Production, and Disaster Recovery. The clusters are LDAP, Kerberos, and Sentry configured for authorization and access controls. Reporting is to be performed directly off the Production data lake only (via the Silver and/or Gold layers only) using standard reporting tools.
Chain Sys deployed dataZense part of Smart Data Platform for scanning the 5000 databases to identify the 90+ PII data elements confirmed by the business. The scanning process was completed in 6 weeks of time and the identified PII and non PII data results were shared with Business for their review and decision making.
With the challenge of having different formats of spreadsheets to perform scanning it was difficult to have uniform format across and time-consuming activity. Scanning is made very simple, in such a way that the Excel file columns can be of any order and it is not necessary, the column order should be the same, across all the files.
Personal Data Identification
Personal data are grouped into two different categories Critical and Confidential Data Elements or Categories.
Critical data elements are those elements by which you can directly identify an individual/person such as their social security number, National Identifier, Visa Number, Passport Number, Employee Number etc.
Confidential data elements are information which when combined with other personal data elements are able to identify individuals. E.g.: First name, Last Name, Date of Birth.
GDPR Business Rules
Personal data are identified based on the pre defined business rules and algorithms.
Dashboards
Following Dashboards shows the Count of Databases by Region , Count of Tables by Databases and Count of Columns by Databases.
dataZense - To Visualize, Analyze, Catalog and Scramble Data for Effective Decision Making & Security.