You are here:  Home > Useful > Data Classification Last updated 22 Aug 2016   

Data Classification for the Masses

This System Administration, Networking and Security Institute (SANS) Internet Storm Center (ISC) blog (HTTPS Direct) has the title above and is a good introduction to the subject. The difficulty with Data Classification, and the associated Categorization of that Data, is that it is a pre-cursor to Information Security. The latter has an imlicit dependence on encrpytion when Data is not unclassified or public (two words which mean something similar in terms of Data Classification, but the first is used by Military Organizations and the second by Business in general). In this case, the Data can be released (or leaked) without undue incident for the Organization concerned.

Most Organizations probably don't do Data Classification very well, except where there is a legal or regulatory need to do it according to published standards. For example, if an Organization has a Patents, or Regulatory Affairs, departments (or both), the rest of that Organization will probably not have the incentive to do Data Classification well. The main problem is with leaks of plain text Data, either inadvertantly or deliberately. As an illustration of this, see Microsoft's Windows Information Protection (WIP) Introduction blog (HTTPS Direct). To quote from this (invoking fair use of copyright):
  • 87% of Senior Managers admit to leaking Data to unmanaged Personal locations.
  • 58% of people have leaked Data to the wrong person.
In these cases it is probable that the Data was in plain text. If the Data had been encrypted, the impact would, likely, have been lessened. The lesson from this is that Human behaviour, Data Classification and Information Security need to be considered as a whole, but usually this is not done. If people find the rules too burdensome, they will be bent. Only Military Organizations have the training and discipline to make the rules work (most of the time), and they have been doing it for a very long time (See William Slim's Address on Leadership in Management [HTTP Direct, not Secure] given to the Adelaide Division of the Australian Institute of Management on 04 Apr 1957 and republished in the Australian Army Journal in Jun 2003. Search for thousands of years).

How can we lessen the impact of Data leakage (both within and without the Organization)?

If plain text is part of the Problem, then cipher text has to be part of the Solution. Anyone who has tried to get a Public Key Infrastucture (PKI) implemented in an Organization, or has even tried to get Senior Management to subscribe to whole Disk encryption for Notebook PC's (least they get mis-placed, or stolen), knows that it is like herding cats (with apologies to cats). However, in the Scheme of Things (with apologies to the Internet of Things), the encryption approach might just be tenable, as long as the Data Classification Scheme is simple and that the Scheme can be supported with a clear, uncomplicated, Policy and one Standard Operating Procedure (SOP. OK, maybe a couple). Of course, this is a Scheme for general Data Classification in support of a cipher text Information Security Policy (not any mandated Scheme as for the examples of the Patents and Regulatory Affairs Departments).

What might this Scheme look like?

Well, the United States Computer Emergency Readiness Team (US-CERT) has a web page on the Traffic Light Protocol (HTTPS Direct, TLP), which defines the Data Classification levels of:
  • Red - Restricted (or Secret). Company Budgets, Profit targets, Data governed by Regulation (Patents and Regulatory Affairs), Divestments, Mergers and Acquisitions.
  • Amber - Confidential. Company Payroll, Department Projects and Email outside of the Business.
  • Green - Private. Intra-Departmental Email.
  • White - Public. Internet Web site.
Note: The non bold text is AllIncontext's summary of the TLP text to indicate a General Data Classification Scheme for a Business.

Using Encryption to implement the Scheme

The Microsoft blog above indicates that unintentional Data Leakage is a problem in a Business. If this is done via Email, the solution might be to use Email encryption. However, isolating Email messages from other Data, such as in Files and Backups, usually means that Email encryption tends to be done piecemeal in a Business (where the requirements demand it), and a full Public Key Infrastructure (PKI) implementation has the same problem (only Email messages are encrypted).

Email encryption is done in the following way (if X509 Public Key encryption is used):
  • The Email plain text Data (Message and any Attachments) is put into a cipher text alternate view which uses the Cryptographic Message Syntax (CMS, which is a warapper for the cipher text).
  • The CMS wrapped Data is contained in a file called smime.p7m (every encrypted Email message uses this same Filename).
  • The process of turning plain text into cipher text uses the recipient's X509 Public Certificate (containing the Public Key). This Public Key can only encrypt a limited amount of Data, so Encrypted Email is actually a hybrid system. A pseudo-random symmetric Key is generated for each Email message to be encrypted and this is used to encrypt the Message and any Attachments. The Public Key then encrypts this symmetric Key. In this way, CMS allows more than one recipient to decrypt the smime.p7m file.
  • Each recipient of the encrypted Email message can decrypt the smime.p7m file with their own Private Key (The X509 Certificate uses a Private/Public Key pair. The Public one is given out, the Private one is kept secure on your PC, for example).
So, if Email encryption is about the creation and transmission of an smime.p7m file, why can't this approach be used generally? It can and it is called bit bagging, a term used by Cryptographer Dr Peter Gutmann of the University of Auckland in New Zealand, in his publication (HTTPS Direct, PDF file) entitled Everything you Never Wanted to know about PKI but were Forced to Find Out. In Gutmann's considered view, there is no reason to use X509, in full, as a complex bit bagging Scheme. If you have a certificate management scheme that works, use it in your Business.

One possible Bit Bagging Scheme for the Encryption of Plain Text Data which supports Data Classification

If you have access to code implemented which uses Cryptographic Message Syntax (CMS) functionality (for example in Microsoft .Net), use it on plain text Data residing in any file to create a cipher text file with the .p7m file type. This is the point where your Data Classification Scheme comes into play. Consider sending a cipher text .p7m file to a recipient. This file can contain the following X509 Public Certificates (which were used to encrypt the symmetric Key that encrypted the plain text Data):
  • The recipient's X509 Public Certificate.
  • The X509 Certificate created specifically to represent the Red, Amber or Green Data Classification Scheme. (The White part of the Scheme is Public Data so doesn't need plain text to be encrypted).
  • Each of the above Certificates has a Data Owner. The White Certificate might be owned by the Company Secretary. Note that if a recipient retires or leaves the Company, the White Private Key can be used to decrypt the p7m file, if required (see Note2).
  • The naming convention for the above Certificates might bered_restricted@company_domain, amber_confidential@company_domain and company_secretary@company_domain. The first two Certificates might also have the Company Secretary as Data Owner, but they might have different Data Owners. For example, the Red Certificate Data Owner might be the Director or General Manager responsible for the Patents and Regaulatory Affairs departments. The Amber Certificate Data Owner might be the Finance Director (if Amber is used for Payroll Data).
This encrypted Data can be sent normally using an Email message, or placed on a Fileshare, or even in the Cloud and is reasonably secure because it is encrypted. The recipient knows whther the Data is Restricted, Confidential or Private and can act on it as appropriate (if necessary referring to the Scheme Policy and Standard Operating Procedures [SOPs]). The act of encryption ensures Data integrity and Authenticity. If more than one recipient needs to receive the Data, that is no problem. You just add as many recipient X509 Public Certificates into the CMS structure as you need. If the encrpyted file is sent to the wrong person, it might be an embarrassment, but that recipient cannot do anything with the Data because their X509 Public Certificate is not one of those in the CMS structure (assuming they have one).

Note1: Public Key encryption is not a panacea. It will keep Data safe from casual perusal and might be safe against a concerted attack, but this cannot be guaranteed. In the case of the Red level of the Data Classification Scheme, a form of double-encryption might be needed. This might use symmetric encryption (an Authenticated Encryption with Additional Data [AEAD] Scheme) before Public Key encryption. However, the Business then needs to ensure there is an appropriate Policy and Standard Operating Procedures (SOPs) to manage Symmetric Keys (this is why Public Key encryption was created, to manage Key exchange relatively painlessly).

Note2: Each X509 Certificate has a validity period (the NotBefore and NotAfter dates). The Internet Engineering Task Force (IETF) stipulates, in RFC 5280 (section 4.1.2.5), that during the validity period the Certificate Authority (the Issuer of the Certificate) warrants that the Certificate status will be maintained. That is, if a Certificate request is made to revoke that Certificate during the validity period, and by the Subject of that Certificate (for example the person who has the Email Address of an Email encryption Certificate), the Certificate Authority will revoke that Certificate Serial Number in a maintained Certificate Revocation List (CRL) or using Online Status Certificiation protocol (OCSP). Generally, if a program uses the Public Certificate to encrypt Data, that program will only proceed with the encryption process if the Public Certificate is valid, that is between the NotBefore and NotAfter Dates. Outside of these Dates, a program using the Public Certificate to perform the encryption of Data will not proceed with that encryption if the Public Certificate is not valid. Suppose you have encrypted Data, which implies that the process of encryption took place during validity, and you want to decrypt it on a Date beyond the NotAfter Date of the Public Certificate. Should you, and indded can you, decrypt the Data. The answer is yes because the validity period only applies to the Public Certificate.

However, remember that in order to obtain your Public Certificate, you had to send a Certificate Signing Request (CSR) to the Certificate Authority (CA). The CSR contained a Subject (probably your Email address) and this (along with the rest of the relevant Data in the CSR) was signed by your Private Key (in other words, a Hash was taken of the relevant Data and this Hash was encrypted with your Private Key to form the signature). The CA authenticates your CSR by decrypting your signature with your Public Key to get the Hash, and then uses the Public Key to calculate a new Hash of the Data. If the received Hash matches the calculated Hash, the CA determines that the CSR has been authenticated. Reminder: Both the Pricate and Public Key of the Key pair can both encrypt and decrypt Data, and both can be used to create a Hash. The convention is that the Private Key is kept safe and the Public Key is given to someone who wants to send you encrypted Data. That encrypted Data also needs a signature and it is created by calculating the Hash over the Data and then emcrypting that with the Public Key. On receipt of the encrypted Data, you do what the CA did. You decrypt the signature with your Private Key, get the Hash and then calculate the Hash over the Data using your Private Key. If the received Hash and calulated Hash match, you conclude that the Data is authentic and hasn't been tampered with.

Which brings us back to the validity period of the Public Certificate. Since your Private Key was instrumental in getting the CA to sign the CSR to make you Public Certificate, the validity period might be deemed to say something about the authenticity of your Private Key. However, only you know if the Private Key should be used to decrypt Data encrypted during the Public Certificate's validity period. All this might semm arcane, but remember that the encryption and decryption processes depend on third party code used by te programs that do the encryption and decryption. Why? Because creating cryptographic software primitives (the engine room of the encryption and decryption process) is specialized and beft left to people skilled in doing it. This means that on a Windows Operating System (OS), you are reliant on Microsoft's cryptographic functions if you don;t use something like the open source OpenSSL software. You might also depend on the OS for Certificate Storage. At any stage, the third party might decide to change functionality in suble ways. For example, in Microsoft Outlook 2002, you could not decrypt an encrypted Email message beyond your Public Certificate's NotAfter Date. That was a design decision by the Outlook team at the time. An interesting question about decrypting old messages is on the StackExchange blog (HTTP Direct, not Secure).

Note3: To clarify Note2, consider the use of a Code Signing Certificate. This is used with Time Stamping so that an EXE, or DLL, signed with this Certificate and Time Stamped by a Third party during the validity period still has a valid Code Signing signature beyond the NotAfter date (otherwise EXE's and DLL's would need to be re-signed on a regular basis and replaced on every PC running that EXE or DLL). In the case of Email encryption Certificates, hardly anyone revokes such a Certificate, and as Dr Gutmann points out, all Certificate revocation schemes are broken (there have been CRL, OCSP and Certificate Pinning Schemes, and now we have the Certificate Transparency Scheme). Although the CMS struture allows for a PKCS9SigningTime property, it is not authenticated. However, if an Email message containing a .p7m file is itself encrypted from the Email client Sent folder and by the Sender (also using the Company Secretary Certificate for Private level in the Scheme), the sending time will be present in the message and the Sender encrypted Sent Email message can be backed up as appropriate. Once decrypted as needed by the Company Secretary Certificate, the various Dates (message Sent time and Recipient validity period) can be checked. The only uncertainty would be the actual date and time of the original encryption which depends on the PC date and time setting, but this can be mitigated by replacing the original .p7m file with a Zip file containing both a Network Time Protocol (NTP) Data structure the plain text Data file. This Zip is then encrypted to the .p7m file (only a modicum of extra complexity). In this case, the risk of decrypting a backed up .p7m file a year or more after the original Certificate(s) validity period(s) is minimized.

AllIncontext Limited is registered in England, No 04624520. Registered office address: 12-14 High Street, Petersfield, Hampshire, GU32 3JG.

Valid XHTML 1.0 Strict   Valid CSS!