Computer forensics

Should you be concerned about Metadata?

 

Data hidden within
In order to understand the possible dangers that may lurk within the documents that we proudly produce daily and then freely distribute within the office environment, we first need to grasp the type of unintentional information one can disclose to potential clients, third parties, an outsider, or even worse a competitor.

It is of benefit to have nicely prepared reports, presentations and fancy spreadsheets that contain valuable information meant for the eyes of your superiors and management colleagues. But have you given thought to what you have disclosed without realising? What you might have disregarded as unimportant detail in a specific document or spreadsheet could in fact give a competitive edge to others, especially in your competitors' hands. In order to keep things simple, throughout the article I will occasionally refer to Microsoft Office applications i.e. Microsoft Word, Excel and PowerPoint, to highlight examples of metadata.

What is metadata?
To begin with, what is metadata? If you try conducting searches on the internet, you will definitely get a few suggested model definitions about metadata in general. One interpretation of metadata is "data about data" or "information about information" - probably the most direct meaning of metadata.

Metadata is used to help us understand information. In the context of digital photography, any photographs taken will contain some sort of metadata such as the time and date when the picture was taken, the shutter speed and aperture settings, to name a few examples. Although the metadata is held within the electronically stored file, the information may not necessary be printed on the photograph except possibly the date and time the picture was taken, hence most of the information is hidden and is better known as metadata.

In our daily office environments, vast numbers of documents are produced and all of them will contain some sort of metadata. Examples include the author, date and time the document was created, tracked changes, last printed time and date, etc. The list of metadata available varies between different document applications.

Metadata risks
Whenever a new document is created, whether you like it or not, the application (Microsoft Office) starts to log data about data. However, not all metadata is automatically created; some requires the user to manually turn on a particular function. For example, if you were to review a document created by your colleague, you may want to have the tracked changes function turned on so that after your review, your colleague can easily identify any changes within the original document or view it in its "native format".

Now that we have a basic understanding of metadata and the possible types of metadata that are available, what are the possible dangers that lurk within, that may cause embarrassment to the business if a document is mistakenly distributed in its native format? Probably the most talked about metadata is "tracked changes", a function that is manually either turned on or off. It would definitely be a disadvantage and a great loss of face if a client found out that there was a significant change in the terms and conditions, possibly due to new amendments in the law with regards to specific regulations. A possible scenario would be for that client to identify within the metadata the changes to the revised terms and conditions. However, the interpretation being inconsistent and together with the previous name of the company it went to. While the client is confused in interpreting the terms and conditions, the client's perception of the business may not be professional and may not meet the client's expectations.

Taking a look at Excel, at first glance it doesn't seem very different to Word in terms of the metadata provided. However, more is uncovered when you open up the spreadsheet in its native format on screen. The numbers within a cell can reveal formulae used for calculating the values for each individual cell, whether hidden columns or rows have been purposely hidden or whether any information is written to the spreadsheet outside of the printable area. Thus, in comparison, a lot of information is automatically disclosed if the spreadsheet is provided in its native format as opposed to a normal printed copy of the document.

On the other hand, metadata is an advantage to many office environments in that it is used to organise, sort and provide keyword search facilities to trace specific documents within a document library. Those who choose to use metadata may find it useful for this purpose, but careful consideration is required in choosing the right types of metadata to be used for searching.

How to minimise the risks
So what are the options for businesses that do not want to disclose sensitive metadata to other parties? There are a variety of steps that can be taken to minimise the risk of confidential information being disclosed. However, your choice of options should be made after an analysis of your documents in their native format, prior to their disclosure to others. One simple and effective way of reducing risk is to avoid sending potentially sensitive documents electronically.

Those of you who still wish to forward documents in their native format may wish to consider the following options to help reduce the amount of available metadata to an acceptable level:

• convert the documents to portable document format (PDF);
• convert the documents to a text file (TXT) or rich text format (RTF);
• manually remove as much metadata prior to sending the documents; and
• introduce metadata removal tools to remove particular types of metadata.

No matter what type of method you choose, it is important to lay down effective policies and procedures to reduce the possible risks of disclosing documents in their native format while at the same time minimise unintentional leakages of confidential information to others through poor management of metadata. These effective policies and procedures may include:

• training in the types of metadata available;
• identifying any potential metadata risks;
• developing a metadata awareness programme;
• emphasis on the distribution of documents in an alternative format;
• implementing an automatic cleansing solution for documents prior to distribution.

When developing effective policies and procedures to deal with metadata, an organisation must understand the potential risks underlying particular metadata disclosures by evaluating the types of information that can be made available. In such circumstances, the organisation must take into account different measures to help reduce the impact of potential threats and mitigate those risks through risk analysis, internal audit reviews, effective compliance programmes and take proactive steps to prevent the disclosure of particular types of metadata.

Strategy for metadata
The goal of an effective strategy for metadata is to formulate a set of predefined instructions to be used under a given set of circumstances based upon information collected from within the business by those who may potentially release documents externally. However, if required it can be reviewed on a case by case basis. The overall strategy should take into consideration the business, technical and legal factors that may influence the business in classifying good or bad metadata. The strategy should respond to the business needs and specific situation - there is no one right solution that fits all organisations.

In seeking out the right policies and procedures to deal with potential leakages of confidential information through metadata, the business should consider stringent steps to assess and minimise the potential risks attached to distribution of documents in their native formats. Most importantly, the organisation should strike a balance between real life scenarios and possible scenarios to define an effective programme for metadata.

Matthew Chu
Forensic and Investigation Services
matthew.chu@gthk.com.hk

 

Main

Next