Sensitive data lives everywhere in the organization, including databases, systems, documents, and apps. However, not all data stores are the same, creating classification challenges for some automated solutions. OneTrust Data Discovery uses advanced machine learning (ML) and artificial intelligence (AI) to identify documents that cannot be classified using traditional pattern matching approaches. By determining a document based on its content and context, organizations can then automatically apply the right governance policies to ensure data is used responsibly.
Eliminate manual effort and classify data using content and context
OneTrust Data Discovery goes beyond traditional pattern matching to intelligently scan and identify a document, such as a resume, passport, financial statement, or medical record. Machine learning helps saves time by classifying data at scale to minimize manual intervention and increase accuracy.
Automatically apply retention, deletion, and data protection policies
Once data is classified, security teams can ensure data is protected and handled based on its classification according to regulatory requirements. Using our improved classification and document identification, we can apply policies at the data level, such as ‘files containing PII’ and document level, like ‘resumés’ or ‘financial reports.’
Using these improved classifications enables the application and enforcement of policies like retention, deletion, or quarantine. We can also apply access policies to different data or document types, like ensuring that sensitive files or data are not shared with open access.
Applications of ML models
OneTrust Data Discovery employs a number of intelligent technologies and new techniques to help our customers better discover, control, and activate their data at scale.
We use AI, natural language processing (NLP), and ML technology to automate document classification and categorize documents based on content, because industries like legal, healthcare, and finance have large volumes of documents to process. The algorithms learn from labeled data sets to recognize patterns and characteristics in text to classify documents accurately and efficiently.
A classic area where a lot of solutions struggle is with named entities. Think about the word “Savannah,” where it could be a person’s name or the city in the U.S. state of Georgia. To help classify data appropriately, we have tuned Spacy's Named Entity Recognition (NER) model, which is a machine learning algorithm to identify and extract named entities (people, organizations, locations) from unstructured text data. It can identify named entities in different languages, making it valuable for global customers.
We have also developed new ways to utilize OCR (Optical Character Recognition) machine learning models to extract characters from images, including printed or handwritten text, to convert to machine-readable. Thanks to the speed of our scanning technology, classification of PDFs and JPGs can be completed at scale.
Privacy by design is built-in to our AI and ML strategy
OneTrust has been utilizing machine learning and AI for more than a year and it has been trained and used by privacy professionals. Our strategy has always been to use these and new technologies to better uncover, classify, protect, and encourage the responsible use of data across all enterprises.
We have built and deployed our technology with privacy by design in a way that each customer’s model is their own, tailored and trained by their own unique data and environment. Those models are never shared with anyone else.
Let us show you how it works — request a demo today.