Privacy - Comprehensive Guide of Privacy

Infosys Responsible AI toolkit - Explanation of Privacy Features

Overview

The Infosys Responsible AI Privacy module is designed to identify and mask Personally Identifiable Information (PII) within various data formats, including text, images, DICOM, video, and structured data. This ensures data privacy and compliance with regulations like GDPR and CCPA.

Key Features and Benefits

  • Comprehensive PII Detection: Identifies a wide range of PII, including names, addresses, social security numbers, credit card numbers, and more.

  • Advanced Masking Techniques: Employs various masking techniques to protect sensitive information, such as hashing, generalization, and encryption.

  • Customizable Policies: Allows organizations to define custom PII detection and masking rules based on specific requirements and compliance standards.

  • Scalability and Performance: Designed to handle large volumes of data efficiently, ensuring minimal impact on performance.

  • Integration with AI/ML Pipelines: Seamlessly integrates with AI and machine learning workflows to ensure privacy by design.

  • Regulatory Compliance: Helps organizations comply with data privacy regulations like GDPR, CCPA, and HIPAA.

Technical Details and Implementation:

  • Presidio: Leverages the Presidio library for PII detection and masking.

  • NLP and Machine Learning: Utilizes advanced NLP techniques and machine learning models to improve accuracy and efficiency.

  • OCR and Computer Vision: Employs OCR and computer vision techniques to extract text from images and documents.

  • Differential Privacy: Applies differential privacy techniques to protect sensitive information in structured data.

  • Parallel Processing: Leverages parallel processing to accelerate processing time for large datasets.

Detailed Look At Our Privacy Modules

 Unstructured Text

List of Methods for Unstructured Text Data

Method

Description

Request

Response

Text Analyze

Detects PII entities in a text and gives JSON report as an output.

Mandatory Fields: Input text

Optional Fields:  portfolio, account, piiEntitiesToBeRedacted, exclusionList

JSON Report

Text Anonymize

Anonymizing detected PII entities in a text and gives anonymized text as an output.

Mandatory Fields: Input text, fakeData.

Optional Fields:  portfolio, account, piiEntitiesToBeRedacted, exclusionList

Anonymized Text

Encrypt

Encrypts the PII data in a text using a key.

Mandatory Fields: Input text, fakeData.

Optional Fields:  portfolio, account, piiEntitiesToBeRedacted, exclusionList.

JSON containing Encrypted text

Decrypt

Decrypt the encrypted PII in the text using the encryption key.

Mandatory Fields: JSON containing Encrypted text.

Decrypted Text

 Image

List of Methods for Image Data

Image Masking

Working :

  1. Pre-processing : Extract the text from image using OCR (Tessaract, EasyOCR, Computer Vision)

  2. Process : Detect the PII Entities from the extracted text as mentioned in the below table.

Method

Description

Request

Response

Image Analyze

Detects PII entities in an extracted text and gives list of PII Entities as an output.

Mandatory Fields: Image, Ocr method, Magnification, Rotation flag.

Optional Fields:  portfolio, account, piiEntitiesToBeRedacted, exclusionList

JSON Report

Image Anonymize

Anonymizing detected PII entities from an extracted text by drawing bounding boxes in the place of text.

Mandatory Fields: Image, Ocr method, Magnification, Rotation flag.

Optional Fields:  portfolio, account, piiEntitiesToBeRedacted, exclusionList.

Base64 Image

Hashify

Hashes the PII text present in an Image which is mapped to account. Gives back the Json which contains the hash value with a key that is mentioned on the bounding boxes in the response image.

Mandatory Fields: Image, portfolio, account,  Ocr method, Magnification, Rotation flag.

Optional Fields:  piiEntitiesToBeRedacted, exclusionList

JSON containing Image and hash key-value pair

Dicom Masking

Working :

  1. Pre-processing : Extract the text from medical image(dicom images) using OCR.

  2. Process : Detect the PII Entities from the extracted text as mentioned in the below table.

Method

Description

Request

Response

Dicom Anonymize

Anonymize PII entities in medical x-ray images.

Payload containing .dcm file

 Returns base64 .dcm file containing masked PII Entities.

 Code

List of Methods for Source Code

Working :

Extract PII entities from the code using StarPII model.

Method

Description

Request

Response

Code Anonymize

Anonymize PII entities in code text.

Input code text.

Anonymized code.

Codefile Anonymize

Anonymize PII entities in code file.

Input code file.

Anonymized code file.

 

 Structured Text

List of Methods for Structured Text Data (Differential Privacy)

Working: Processing structured data using differential privacy techniques.

  1. Suppression: removing entire column from the table.

  2. Generalization (Range): converting the numerical data into a range.

  3. Noise: adding Laplace noise to the numerical values.

  4. Binary: randomly changes the value with the opposite value in a binary data.

Method

Description

Request

Response

Differential Privacy File

Load csv file and gives the column name which can be used in further api.

Payload: csv file

List of column names.

Differential Privacy Anonymize

Using the column name, user can apply differential privacy.

Optional Fields:  Suppression, Range, Noise, Binary.

Gives the processed csv.

 Video

List of Methods for Video

Working:

  1. Preprocessing :

a.  extracting the frames from the video.

b.  extracting the text from the frame.

  1. Processing: process the frames using sampling and threading method for detection of PII data.

Method

Description

Request

Response

Video Anonymize

Detect and anonymize PII entities in video frames by drawing the bounding boxes.

Mandatory Fields: Image, Ocr method, Magnification, Rotation flag.

Optional Fields:  portfolio, account, piiEntitiesToBeRedacted, exclusionList

Processed base64 video.

 

 Document

List of Methods for Document Masking

PDF Masking

      Working:

  1. Preprocessing: Using PyMuPDF

a.  Extract text and image from the pages.

b.  Extract text from the image.

  1. Processing: detect and anonymize PII data from the image and text using parallel processing (threading).

Method

Description

Request

Response

PDF Anonymize

Detect and anonymize PII entities in pdf pages in text or image format.

Mandatory Fields: Image, Ocr method.

Optional Fields:  portfolio, account, exclusionList

Processed pdf file.

PPT Masking

      Working:

  1. Preprocessing:

a. Extract text and image from the pages

b. Extract text from the image.

c. Extract text from the tabular data.

  1. Processing: detect and anonymize PII data from the image and text using parallel processing (threading).

Method

Description

Request

Response

PPT Anonymize

Detect and anonymize PII entities in ppt slides in text, image or tabular format.

Mandatory Fields: Image, Ocr method.

Optional Fields:  portfolio, account, exclusionList

Processed ppt file.

DOCX Masking

  Working:

  1. Preprocessing:

a. Extract text and image from the pages

b. Extract text from the image.

  1. Processing: detect and anonymize PII data from the image and text using parallel processing (threading).

Method

Description

Request

Response

Docx Anonymize

Detect and anonymize PII entities in docx in text, image format.

Mandatory Fields: Image, Ocr method.

Optional Fields:  portfolio, account, exclusionList

Processed docx file.

Excel Masking

  Working:

  1. Preprocessing: extracting the text from cell and replace in the cell

  2. Processing: detect and anonymize PII data from the text and replace in the cell

Method

Description

Request

Response

Excel Anonymize

Detect and anonymize PII entities in excel.

Excel file as a payload

Processed excel file.

CSV Masking

 Working:

  1. Preprocessing: extracting the columns and data

  2. Processing: detect and anonymize PII data from the columns and replace in the cell using batch-anonymizer

Method

Description

Request

Response

CSV Anonymize

Detect and anonymize PII entities in CSV.

csv file as a payload

Processed csv file.

Json Masking

 Working:

  1. Preprocessing: extracting data from Json file and re-structure it into required format

  2. Processing: detect and anonymize PII data from the Keys and replace in the Json using batch-anonymizer

Method

Description

Request

Response

Json Anonymize

Detect and anonymize PII entities in Json.

Json file as a payload

Processed Json file.