Portable Document Format (PDF) is an open standard for document exchange. This file format was created by Adobe Systems in 1993 and it excessively used for representing documents in an independent manner, i.e. no need for any special application software, hardware and operating systems. Each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, graphics, and other information needed to display it.
PDF was originally a proprietary format controlled by Adobe, and was officially released as an open standard on July 1, 2008, and published by the International Organization for Standardization as ISO 32000-1:2008.
The extension of the file is .pdf. This format was developed by Adobe ® Systems for ease of document sharing on the Internet and for easy printing of documents. It retains the original layout of the document when converted to a PDF file and is a very secure way of sharing and publishing information on the Internet.
PDF is an open file format specification; it is available to whoever wants to develop software utilities for managing, creating, accessing or viewing PDF documents. Adobe ® offers a free reader to view PDF files, which is known as Adobe Reader.
Adobe soon started distributing its Acrobat Reader (now Adobe Reader) program at no cost, and continued supporting the original PDF, which eventually became the de facto standard for printable documents on the web (a standard web document).
A PDF file primarily consists of multiple objects which are shown below:
- Boolean values, representing true or false
- Arrays, ordered collections of objects
- Dictionaries, collections of objects indexed by Names
- Streams, usually containing large amounts of data
- The null object
LAYOUTS TO THE PDF FILES
- Non-linear (not “optimized”)
- Linear (“optimized”)
Non-linear PDF files (also called not “optimized”) consume less disk space than their linear counterparts, though they are slower to access because portions of the data required to assemble pages of the document are scattered throughout the PDF file.
Linear PDF files (also called “optimized” or “web optimized” PDF files) are constructed in a manner that enables them to be read in a Web browser plugin without waiting for the entire file to download, since they are written to disk in a linear fashion.
Security and Signatures
A PDF file may be encrypted for security, or digitally signed for authentication. The standard security provided by Acrobat PDF consists of two different methods and two different passwords, “user password” and “owner password”. A PDF document may be protected by password to open (‘user’ password) and the document may also specify operations which should be restricted even when the document is decrypted: printing, copying text and graphics out of the document, modifying the document, or adding or modifying text notes and AcroForm fields (using ‘owner’ password). However, all operations (except the document open password protection, if applicable) which are restricted by “owner” or “user” passwords are trivially circumvented by many commonly available “PDF cracking” software and even freely online, and if circumvented these restrictions no longer let the author control what can and cannot be done with the PDF file once distributed. This warning is also displayed when applying such restrictions using Adobe Acrobat software to create or edit PDF files.
Even without removing the password, most freeware or open source PDF readers will ignore the digital rights management “protections” and will allow the user to print or make copy of excerpts of the text as if the document were not limited by password protection.
PDF files can be created specifically to be accessible for disabled people. Current PDF file formats can include tags (XML), text equivalents, captions, audio descriptions, et cetera. Some software can automatically produce tagged PDFs, however this feature is not always enabled by default. Leading screen readers, including JAWS, Window-Eyes, Hal, and Kurzweil 1000 and 3000 can read tagged PDFs; current versions of the Acrobat and Acrobat Reader programs can also read PDFs aloud. Moreover, tagged PDFs can be re-flowed and magnified for readers with visual impairments. Problems remain with adding tags to older PDFs and those which are generated from scanned documents. In these cases, accessibility tags and re-flowing are unavailable, and must be created either manually or with OCR techniques. These processes are inaccessible to some disabled people. PDF/UA, the PDF/Universal Accessibility Committee, an activity of AIIM, is working on a specification for PDF accessibility based on ISO 32000.
One of the significant challenges with PDF accessibility is that Adobe Acrobat is capable of presenting PDF documents in three distinct views, which can be inconsistent with each other for a variety of reasons. Acrobat offers the following three views:
- the physical view,
- the tags view, and
- the content view.
The physical view is displayed and printed (what most people consider a PDF document). The tags view is what screen readers read (useful for people with poor eyesight). The content view is displayed when Acrobat’s “Reflow” tool is invoked.
There is much confusion on the subject of whether or not Acrobat’s “Reflow” tool is relevant to accessibility since it does not process the tags tree, and thus cannot correctly represent content that flows between pages. Some feel that the Reflow tool is useful for people with mobility disability, and thus take the view that for a PDF document to be accessible, Acrobat’s three views must be consistent with each other. In reality, the “content view”, and thus Acrobats Reflow tool (as of Acrobat X) displays the PDF files rendering rather than logical order, and is thus irrelevant to accessibility.
Viruses and exploits
PDF attachments carrying viruses were first discovered in 2001. The virus, named “OUTLOOK.PDFWorm” or “Peachy”, uses Microsoft Outlook to send itself as an attachment to an Adobe PDF file. It was activated with Adobe Acrobat, but not with Acrobat Reader.
On March 30, 2010 security researcher Didier Stevens reported an Adobe Reader and Foxit Reader exploit which runs a malicious executable if the user allows it to launch when asked.
Usage restrictions and monitoring
PDFs may be encrypted so that a password is needed to view or edit the contents. The PDF Reference defines both 40-bit and 128-bit encryption, both making use of a complex system of RC4 and MD5. The PDF Reference also defines ways in which third parties can define their own encryption systems for use in PDF.
PDF files may also contain embedded DRM restrictions that provide further controls that limit copying, editing or printing. The restrictions on copying, editing, or printing depend on the reader software to obey them, so the security they provide is limited.
The PDF Reference has technical details for an end-user overview. Like HTML files, PDF files may submit information to a web server. This could be used to track the IP address of the client PC, a process known as phoning home. After update 7.0.5 to Acrobat Reader, the user will be notified “via a dialogue box that the author of the file is auditing usage of the file, and be offered the option of continuing.
Through its Live Cycle Policy Server product, Adobe provides a method to set security policies on specific documents. This can include requiring a user to authenticate and limiting the timeframe a document can be accessed or amount of time a document can be opened while offline. Once a PDF document is tied to a policy server and a specific policy, that policy can be changed or revoked by the owner. This controls documents that are otherwise “in the wild.” Each document open and close event can also be tracked by the policy server. Policy servers can be set up privately or Adobe offers a public service through Adobe Online Services. As with other forms of DRM, adherence to these policies and restrictions may or may not be enforced by the reader software being used.
SysInfoTools PDF Repair software can easily recover maximum data from corrupt PDF files and it can repair multiple PDF files simultaneously.
SysInfoTools PDF Manager software helps PDF users manage their PDF files by four ways. This software can split large PDF files into small number of PDF files, protect PDF files by applying password to save PDF files from unauthorized users, remove restrictions like copying, printing editing and can merge small number of PDF files into a single PDF file.
SysInfoTools PDF Image Extractor software safely extracts all types of images, pictures and graphics from single as well as multiple PDF files. The software also converts PDF files into JPEG, BMP, PNG and GIF format without causing any damage.