The first part of a data loss prevention (DLP) implementation involves inventory. Of your data, that is — because, quite simply, you can’t protect it if you don’t know it’s there.
So the first thing DLP does is discover where your sensitive data resides. The right DLP capability can sift through file servers, databases, documents, email, and Web content to discover sensitive data wherever it resides and tag it so it can be tracked wherever it goes.
Thanks to advanced detection technologies, DLP can accurately analyze both the content and context of data, making data leakage prevention truly affordable.
To identify sensitive data, DLP solutions rely on a DLP analysis engine that conducts deep content analysis based on central policies.
Among the techniques used: Partial document matching and database fingerprinting (or exact data matching) as well as rules-based, conceptual, statistical, predefined categories (like PCI compliance ), and various combinations of these. In addition, some DLP analysis engines also implement vector machine learning techniques.
The stronger the analysis engine, the more accurate is its data identification, which is a key factor in limiting both false positives and false negatives — so DLP deployments should be tested for accuracy and both they and the policies they enforce should be tuned to ensure minimal false positives/negatives.
Next time, I’ll describe what DLP technologies do once data identification is in hand.