PDF files contain text, images and other element of content that are combined together to make up an electronic document. In addition, there is a set of instructions which defines the logic of binding the content together known as Structure. The structure defines the correct reading order of a PDF file.

In a PDF file, the structure is known as Tags. These tags are the hierarchy of containers describing the semantics of the content inside them and are represented in an invisible layer behind the visible PDF content. A well-structured or well-tagged PDF document can help screen-reading applications, read the content easily. Furthermore, tagged PDFs can make a document accessible to Visually Impaired Individuals. These individuals can understand the content inside a PDF document using a screen reading application. Imagine using your computer with the screen turned off and you’ll get some idea of how important logical text-flow is. Similarly,
you’ll get some idea of how important well-structured tagged PDF document is for people who need screen-readers to read your PDF document.

Tags may be generated automatically for any PDF file using Acrobat Reader if the document is a very simple one. Otherwise, automated tagging does not produce correct results. In order to understand the automated tagging, we can go back to a quote from the movie Forrest Gump:

“My mama always said, life is like a box of chocolates, you never know what you’re going to get”.

Here - life is a user, the box is the PDF document and the chocolates are the Tags - you never know what tag you are going to get with the automated tagging.

This is where Aspose.PDF for Java API comes handy. The API offers functionality to create Tagged PDFs from scratch with high fidelity. ITaggedContent interface in the API helps defining the content properly and is the entry point for creating Tagged PDF Documents.

Aspose.PDF TaggedPDF Features

Following documentation articles demonstrate the functionality to create Tagged PDFs using Aspose.PDF for Java:

With Aspose.PDF for Java, you can also get content and structure of Tagged PDFs. Following articles shows the functionality along with sample code snippet(s):