PDFBox in action

In this article I attempt to use the PDFBox after understanding the basics of the PDF Specification. A good place to start is the PDFBox documentation (http://pdfbox.apache.org/docs/1.8.9/javadocs/). The class that you would be interested is the PDDocument class (http://pdfbox.apache.org/docs/1.8.9/javadocs/org/apache/pdfbox/pdmodel/PDDocument.html). Think of the PDDocument as a representation of the PDF document that you are working on. It enables you to interact with the PDF and helps to access the inner parts of the PDF. There are numerous methods in this class but we will start with a few.

My attempt at this time is to be brief and provide basic information. Hopefully I will be able to elaborate later.

PDDocument load(String filename) - Here is where you give the file name of the PDF that you want to work on.

PDDocumentCatalog getDocumentCatalog() - This gets you the document catalog.

COSDocument getDocument() - This gets the raw representation of the catalog.

To be Contd.