Managing file formats is an important topic to consider in digital preservation. In a broader context, one needs to study the application and implications of digital file formats. A full listing of recommended formats from the Library of Congress is available online. The Library of Congress’s recommended formats are based on seven sustainability factors. These include:
- Disclosure – specifications and tools for validating the integrity and accessibility of the format exist. You can now find out how information is encoded as bits or bytes.
- Adoption – the format is widely used. If everyone is using it, tools will be available for migration and emulation.
- Transparency – It’s easy to analyze the format using basic tools, such as human readability. Information is not encrypted or compressed.
- Self-Documentation – the format allows you to add metadata directly to the record. You don’t have to have a program or a database to find out what the record is.
- External Dependencies – How much hardware or software do you need to access the format? The less specialized hardware or software you need, the better.
- Impact of Patents – Patents could make it harder to open or migrate formats. Less of a worry with formats that are widely adopted.
- Technical Protection Mechanisms – Formats should not be tied to a particular vendor or program. The format should be accessible regardless of the system to which it was originally uploaded.
The main content types of file formats are images, video, audio, and text. ISO compliant formats for these types of materials include:
- PDF/A
- Plain Text
- XML
- TIFF
- JPEG2000
If you have several different file formats and versions of those formats (numerous different versions of PDF, Word, and Image formats), your digital preservation strategy should alleviate the effects of obsolescence and propagation. Strategies include file migration, emulation, normalization, and developing an institutional policy of only using certain file formats.