Data Classification for the Masses
This System Administration, Networking and Security Institute (SANS) Internet Storm Center (ISC)
blog
(HTTPS Direct) has the
title above and is a good introduction to the subject. The difficulty with Data Classification, and the associated
Categorization of that Data, is that it is a
pre-cursor to
Information Security. The latter has an imlicit dependence on
encrpytion when Data is not
unclassified or
public (two words which mean something similar in terms of Data Classification, but the first is used by Military Organizations and the second by Business in general). In this case, the Data can be released (or leaked) without
undue incident for the Organization concerned.
Most Organizations probably don't do Data Classification
very well, except where there is a
legal or
regulatory need to do it according to
published standards. For example, if an Organization has a
Patents, or
Regulatory Affairs, departments (or both), the rest of that Organization will probably not have the
incentive to do Data Classification well.
The main problem is with
leaks of plain text Data, either
inadvertantly or
deliberately. As an illustration of this, see Microsoft's Windows Information Protection (WIP) Introduction
blog
(HTTPS Direct). To quote from this (
invoking fair use of copyright):
- 87% of Senior Managers admit to leaking Data to unmanaged Personal locations.
- 58% of people have leaked Data to the wrong person.
In these cases it is probable that the Data was in
plain text. If the Data had been
encrypted, the impact would, likely, have been lessened. The
lesson from this is that Human behaviour, Data Classification and Information Security need to be considered as a
whole, but usually this is not done. If people find the rules too burdensome, they will
be bent. Only Military Organizations have the
training and
discipline to make the rules work (most of the time), and they have been doing it for a very long time (See William Slim's Address on
Leadership in Management
[HTTP Direct, not Secure] given to the Adelaide Division of the Australian Institute of Management on 04 Apr 1957 and republished in the Australian Army Journal in Jun 2003. Search for
thousands of years).
How can we lessen the impact of Data leakage (both within and without the Organization)?
If
plain text is part of the Problem, then
cipher text has to be part of the Solution. Anyone who has tried to get a Public Key Infrastucture (PKI) implemented in an Organization, or has even tried to get Senior Management to subscribe to
whole Disk encryption for Notebook PC's (least they get mis-placed, or stolen), knows that it is like
herding cats (with apologies to cats). However, in the Scheme of Things (with apologies to the Internet of Things), the
encryption approach might just be tenable, as long as the Data Classification Scheme is
simple and that the Scheme can be supported with a
clear, uncomplicated, Policy and one
Standard Operating Procedure (SOP. OK, maybe a
couple). Of course, this is a Scheme for
general Data Classification in support of a
cipher text Information Security Policy (not any
mandated Scheme as for the examples of the Patents and Regulatory Affairs Departments).
What might this Scheme look like?
Well, the United States Computer Emergency Readiness Team (US-CERT) has a web page on the
Traffic Light Protocol
(HTTPS Direct, TLP), which defines the Data Classification levels of:
- Red - Restricted (or Secret). Company Budgets, Profit targets, Data governed by Regulation (Patents and Regulatory Affairs), Divestments, Mergers and Acquisitions.
- Amber - Confidential. Company Payroll, Department Projects and
Email outside of the Business.
- Green - Private. Intra-Departmental Email.
- White - Public. Internet Web site.
Note: The
non bold text is AllIncontext's
summary of the TLP text to indicate a General Data Classification Scheme for a Business.
Using Encryption to implement the Scheme
The Microsoft
blog above indicates that unintentional Data Leakage is a problem in a Business. If this is done via Email, the solution might be to use
Email encryption. However,
isolating Email messages from other Data, such as in Files and Backups, usually means that Email encryption tends to be done
piecemeal in a Business (where the requirements demand it), and a full Public Key Infrastructure (PKI) implementation has the same problem (only Email messages are encrypted).
Email encryption is done in the following way (if X509 Public Key encryption is used):
- The Email plain text Data (Message and any Attachments) is put into a cipher text alternate view which uses the Cryptographic Message Syntax (CMS, which is a warapper for the cipher text).
- The CMS wrapped Data is contained in a file called smime.p7m (every encrypted Email message uses this same Filename).
- The process of turning plain text into cipher text uses the recipient's X509 Public Certificate (containing the Public Key). This Public Key can only encrypt a limited amount of Data, so Encrypted Email is actually a hybrid system. A pseudo-random symmetric Key is generated for each Email message to be encrypted and this is used to encrypt the Message and any Attachments. The Public Key then encrypts this symmetric Key. In this way, CMS allows more than one recipient to decrypt the smime.p7m file.
- Each recipient of the encrypted Email message can decrypt the smime.p7m file with their own Private Key (The X509 Certificate uses a Private/Public Key pair. The Public one is given out, the Private one is kept secure on your PC, for example).
So, if Email encryption is about the creation and transmission of an
smime.p7m file, why can't this approach be used generally?
It can and it is called
bit bagging, a term used by Cryptographer Dr Peter Gutmann of the University of Auckland in New Zealand, in his
publication
(HTTPS Direct, PDF file) entitled
Everything you Never Wanted to know about PKI but were Forced to Find Out. In Gutmann's considered view, there is no reason to use X509, in full, as a complex bit bagging Scheme. If
you have a certificate management scheme that works, use it in your Business.
One possible Bit Bagging Scheme for the Encryption of Plain Text Data which supports Data Classification
If you have access to code implemented which uses Cryptographic Message Syntax (CMS) functionality (for example in Microsoft
.Net), use it on plain text Data residing in
any file to create a cipher text file with the
.p7m file type. This is the point where your Data Classification Scheme comes into play. Consider sending a cipher text
.p7m file to a
recipient. This file can contain the following X509 Public Certificates (which were used to encrypt the symmetric Key that encrypted the plain text Data):
- The recipient's X509 Public Certificate.
- The X509 Certificate created specifically to represent the Red, Amber or Green Data Classification Scheme. (The White part of the Scheme is Public Data so doesn't need plain text to be encrypted).
- Each of the above Certificates has a Data Owner. The White Certificate might be owned by the Company Secretary. Note that if a recipient retires or leaves the Company, the White Private Key can be used to decrypt the p7m file, if required (see Note2).
- The naming convention for the above Certificates might bered_restricted@company_domain, amber_confidential@company_domain and company_secretary@company_domain. The first two Certificates might also have the Company Secretary as Data Owner, but they might have different Data Owners. For example, the Red Certificate Data Owner might be the Director or General Manager responsible for the Patents and Regaulatory Affairs departments. The Amber Certificate Data Owner might be the Finance Director (if Amber is used for Payroll Data).
This encrypted Data can be sent normally using an Email message, or placed on a Fileshare, or even in the Cloud and is reasonably secure because it is encrypted. The recipient knows whther the Data is
Restricted,
Confidential or
Private and can act on it as appropriate (if necessary referring to the Scheme Policy and Standard Operating Procedures [SOPs]). The act of encryption ensures Data
integrity and
Authenticity. If more than one recipient needs to receive the Data, that is no problem. You just
add as many recipient X509 Public Certificates into the CMS structure as you need. If the encrpyted file is sent to the
wrong person, it might be an embarrassment, but that recipient cannot do anything with the Data because their X509 Public Certificate is not one of those in the CMS structure (assuming they have one).
Note1: Public Key encryption is not a panacea. It will keep Data safe from casual perusal and might be safe against a concerted attack, but this cannot be guaranteed. In the case of the
Red level of the Data Classification Scheme, a form of double-encryption might be needed. This might use symmetric encryption (an Authenticated Encryption with Additional Data [AEAD] Scheme) before Public Key encryption. However, the Business then needs to ensure there is an appropriate Policy and Standard Operating Procedures (SOPs) to
manage Symmetric Keys (this is why Public Key encryption was created, to manage Key exchange relatively painlessly).
Note2: Each X509 Certificate has a
validity period (the
NotBefore and
NotAfter dates). The Internet Engineering Task Force (IETF) stipulates, in RFC 5280 (section 4.1.2.5), that during the validity period the Certificate Authority (the Issuer of the Certificate)
warrants that the Certificate
status will be
maintained. That is, if a Certificate request is made to
revoke that Certificate during the
validity period, and by the
Subject of that Certificate (for example the person who has the Email Address of an Email encryption Certificate), the Certificate Authority will revoke that Certificate Serial Number in a maintained Certificate Revocation List (CRL) or using Online Status Certificiation protocol (OCSP). Generally, if a program uses the Public Certificate to
encrypt Data, that program will only proceed with the encryption process
if the Public Certificate is
valid, that is between the
NotBefore and
NotAfter Dates. Outside of these Dates, a program using the Public Certificate to perform the encryption of Data will not proceed with that encryption if the Public Certificate is not valid. Suppose you have encrypted Data, which implies that the process of encryption took place during validity, and you want to
decrypt it on a Date
beyond the
NotAfter Date of the Public Certificate. Should you, and indded can you, decrypt the Data. The answer is
yes because the
validity period only applies to the Public Certificate.
However, remember that in order to obtain your Public Certificate, you had to send a Certificate Signing Request (CSR) to the Certificate Authority (CA). The CSR contained a
Subject (probably your Email address) and this (along with the rest of the relevant Data in the CSR) was
signed by your
Private Key (in other words, a Hash was taken of the relevant Data and this Hash was encrypted with your Private Key to form the signature). The CA
authenticates your CSR by decrypting your signature with your Public Key to get the Hash, and then uses the Public Key to calculate a new Hash of the Data. If the received Hash matches the calculated Hash, the CA determines that the CSR has been authenticated. Reminder: Both the Pricate and Public Key of the Key pair can both
encrypt and
decrypt Data, and both can be used to create a Hash. The convention is that the Private Key is kept safe and the Public Key is given to someone who wants to send you encrypted Data. That encrypted Data also needs a signature and it is created by calculating the Hash over the Data and then emcrypting that with the Public Key. On receipt of the encrypted Data, you do what the CA did. You decrypt the signature with your Private Key, get the Hash and then calculate the Hash over the Data using your Private Key. If the received Hash and calulated Hash match, you conclude that the Data is authentic and hasn't been tampered with.
Which brings us back to the
validity period of the Public Certificate. Since your Private Key was instrumental in getting the CA to sign the CSR to make you Public Certificate, the validity period might be deemed to say something about the
authenticity of your Private Key. However, only
you know if the Private Key should be used to decrypt Data encrypted during the Public Certificate's validity period. All this might semm arcane, but remember that the encryption and decryption processes
depend on third party
code used by te programs that do the encryption and decryption. Why? Because creating
cryptographic software primitives (the engine room of the encryption and decryption process) is
specialized and beft left to people skilled in doing it. This means that on a Windows Operating System (OS), you are reliant on Microsoft's cryptographic functions if you don;t use something like the open source OpenSSL software. You might also depend on the OS for Certificate Storage. At any stage, the third party might decide to change functionality in suble ways. For example, in Microsoft Outlook 2002, you could not decrypt an encrypted Email message beyond your Public Certificate's
NotAfter Date. That was a
design decision by the Outlook team at the time. An interesting question about decrypting
old messages is on the StackExchange
blog
(HTTP Direct, not Secure).
Note3: To clarify Note2, consider the use of a
Code Signing Certificate. This is used with
Time Stamping so that an EXE, or DLL, signed with this Certificate and Time Stamped by a Third party during the
validity period still has a
valid Code Signing
signature beyond the
NotAfter date (otherwise EXE's and DLL's would need to be re-signed on a regular basis and replaced on every PC running that EXE or DLL). In the case of Email encryption Certificates,
hardly anyone revokes such a Certificate, and as Dr Gutmann points
out,
all Certificate revocation schemes are
broken (there have been CRL, OCSP and Certificate Pinning Schemes, and now we have the Certificate Transparency Scheme). Although the CMS struture allows for a
PKCS9SigningTime property, it is not authenticated. However, if an Email message containing a
.p7m file is itself encrypted from the Email client
Sent folder and by the
Sender (also using the Company Secretary Certificate for Private level in the Scheme), the sending time will be present in the message and the Sender encrypted Sent Email message can be
backed up as appropriate. Once decrypted as needed by the Company Secretary Certificate, the various Dates (message Sent time and Recipient validity period) can be checked. The only uncertainty would be the actual date and time of the original encryption which depends on the PC date and time setting, but this can be mitigated by replacing the original
.p7m file with a Zip file containing both a Network Time Protocol (NTP) Data structure the plain text Data file. This Zip is then encrypted to the
.p7m file (only a modicum of extra complexity). In this case, the risk of decrypting a backed up
.p7m file a year or more
after the original Certificate(s) validity period(s) is minimized.