Metadata is data about data, and not the data itself. If you are asked to define metadata, that’s the answer. For example, the number of words within a document is metadata; data about the data. If you perform data mining upon your data warehouse, your conclusions are metadata; data about the data. The generated summary is metadata. An XML file has some header data which defines the characteristics of the data that follows. The header data is metadata, and it is also the schema. Metadata is an abstraction about the data. For example, see Phil Factor’s Exploring SQL Server table metadata with SSMS and TSQL.
Microsoft Office document property values (such as “Authors” and “Last Saved By”) are frequently referred to as metadata. Properties are data about the Office document, stored in the document. Revision information and comments within Microsoft Word documents are also referred to as metadata. Comments and revision notes are data stored in support of the comments and revision history features of Microsoft Office. Document properties, revision information, and comments are not “data about data” in the abstract sense mentioned earlier, they are additional data within the document.
Consider information gathered about the clothing in your closet, such as sizes, styles, colors, and quantities. This would be “data about clothing.” Contrast this with fabric care instructions sewn into a garment (“wash in cold water, tumble dry”). Fabric care instructions could also be considered “data about clothing.” We can distinguish the abstract, statistical information from the specific instruction information. We could use a single term, such as “meta-clothing- data,” to refer to either type of information, and we could use a single term to refer to both types of information. A single term to describe both types of information would invite confusion where no such confusion previously existed. We can distinguish between wardrobe and fabric care instructions, but “data about data” sends such distinctions into a muddle.
Metadata has received an expanded definition as “hidden” data, although the data isn’t well hidden. These data elements (revisions and authors) are easy to forget about and easy to neglect to review. If you forward a document outside your company, it is your responsibility to review it, including reviewing the “easy to neglect” data. The “easy to neglect” data is referred to as metadata. Applications other than Microsoft Office applications have their own “easy to neglect to review” data.
Other applications contain easily neglected information which may lead to inadvertent information disclosure; for example, read about metadata in OpenOffice.org.
Metadata (n): 1. data about data, not the data itself 2. data that is easy to neglect to review.
When speaking with the legal community, recognize that the ABA has adopted the second definition. You can review Metadata Ethics Opinions Around the US and compare it with Arizona Supreme Court Decision in DAVID LAKE v. CITY OF PHOENIX, et al. (CV-09-0036-PR) and Williams v. Sprint/United Mgmt. Co., 230 F.R.D. 640,
652 (D. Kan. 2005).
Arizona Supreme Court:
metadata is an inherent part of an electronic document.
Arizona Bar Association:
a lawyer who receives an electronic communication may not examine it for the purpose of discovering the metadata embedded in it.
Other Bar Associations have disagreed with Arizona.
Confusion introduced by the term “metadata” and its appropriate scope has produced diverging opinions regarding a matter of fact. “Just semantics?” Hardly.
FOCA is a tool to extract information in footprinting and fingerprinting phases during a penetration test. It helps auditors to extract and analyze information from metadata, hidden info and lost data in published files. A new release of FOCA, version 2, adds tools to scan internal domains using PTR Scanning, software recognition through installation paths, etc. The idea of FOCA is to give as much info as can be discovered automatically starting from a public domain name.