Add and Search Hidden Text in PDF Documents using Java

Aspose.Pdf for .NET logo

As a part of our continuous improvement process, a new release of Aspose.PDF for Java 11.4.0 has been released having the capability to add or search hidden text in PDF documents. Furthermore, we have introduced the feature to use PDF file objects after calling ProcessPragraphs() method, so that further manipulation can be performed over this object (it was not possible with earlier release versions).

Add and Search Hidden Text in PDF using Java

In order to add hidden text, pass an argument of true to TextState.setInvisible(…) method. TextFragmentAbsorber finds the text that matches pattern (if specified). Please note that hidden text in the document will only be invisible for end-user while viewing the document with PDF reading software (e.g. Acrobat Reader). There are several ways to make text invisible for end-user in PDF and we have implemented one of those techniques. However, the text added through this approach can be found using TextFragmentAbsorber class and we can not guarantee that any hidden text added by third-party applications can be found using the same approach. However, in case you encounter any issue, please share the resource file and we can further investigate the scenario.

Public API changes
The following methods are added:

  • com.aspose.pdf.TextFragmentState.isInvisible()
  • com.aspose.pdf.TextFragmentState.setInvisible(boolean)
  • com.aspose.pdf.TextState.isInvisible()
  • com.aspose.pdf.TextState.setInvisible(boolean)
//Create document with hidden text
 com.aspose.pdf.Document doc = new com.aspose.pdf.Document();
 // add page to pages collection of PDF file
 Page page = doc.getPages().add();
 // create TextFragment instance
 TextFragment frag1 = new TextFragment("This is common text.");
 TextFragment frag2 = new TextFragment("This is invisible text.");
 //Set text property - invisible
 // add TextFragment to paragraphs collection of page instance
 // save the PDF document"c:/pdftest/HiddenText_output.pdf");

 //Search text in the document
 doc = new com.aspose.pdf.Document("c:/pdftest/HiddenText_output.pdf");
 // create TextFragmentAbsorber instance
 TextFragmentAbsorber absorber = new TextFragmentAbsorber();
 // get text content from first page of PDF file
 // iterate through TextFragment inside TextFragments collection 
 for(com.aspose.pdf.TextFragment fragment : (Iterable)absorber.getTextFragments())
     //Display extracted TextFragments and their related properties
     System.out.println("Text = " +fragment.getText() + ", on pos = "+ fragment.getPosition().toString() + 
    		 " and invisibility = " + fragment.getTextState().isInvisible());
 // dispose Document object

Using Page Object after ProcessPragraphs() Call

The processParagraphs() method was introduced to calculate objects placed inside PDF file and in case we need to have page count information during PDF file generation, this method can be used as it manipulates file objects and returns the desired information. In earlier release versions, once this method was called, the file objects could not be accessed any further. So if you need to add any new object to the existing page instance, it was not possible and you had to have a new Page instance where objects can be placed. Nevertheless, starting this new release, you can utilize the same Page objects even after calling the processParagraphs() method.

// instantiate Document object
 Document document = new Document();
 // add page to pages collection of PDF file
 Page page1 = document.getPages().add();

 // create a loop to add 5 TextFragments
 for (int i = 1; i <= 5; i++)
	 // create table object
	 Table table1 = new com.aspose.pdf.Table();
	 // set width for Table columns
	 // render table in new page
	 // create a row object and add it to rows collection of table instance
	 com.aspose.pdf.Row row1 = table1.getRows().add();
	 // create TextFragment
	 TextFragment tf = new TextFragment("part"+ i);
	 // add cell to cells collection of row instance
	 com.aspose.pdf.Cell cell1 = row1.getCells().add();
	 // add text fragment to paragraphs collection of table cell
	 // add table to paragraphs collection of page object

	 // process paragraph objects inside PDF
	 // get page count information from PDF file
	 System.out.println("Number of pages in PDF = " + document.getPages().size());
 // save resultant file"c:/PageCount_out.pdf");

Rotate Method in FreeTextAnnotation Class

A new Rotate(…) method is added for annotation which provides the capabilities to change annotations orientation.

Public API changes
The following methods are added:

  • com.aspose.pdf.FreeTextAnnotation.getRotate
  • com.aspose.pdf.FreeTextAnnotation.setRotate(int)
  • com.aspose.pdf.Rectangle.rotateAngle(int)
// instantiate Document object
 Document document = new Document();
 // add page to pages collection of PDF file
 Page page1 = document.getPages().add();
// create FreeTextAnnotation instance
 FreeTextAnnotation annotation = new FreeTextAnnotation(document.getPages().get_Item(1), new com.aspose.pdf.Rectangle(50, 600, 250, 630),
 new DefaultAppearance("Helvetica", 16, java.awt.Color.RED));
// set contents for FreeTextAnnotation
 // set rotating angle for Annotation
 // create Rectangle object as per Annotation dimensions
 Rectangle rect = annotation.getRect();
 // rotate Rectangle object
 // set border color for Annotation as Red
 // set border information for Annotation instance
 annotation.setBorder(new Border(annotation));
 // set border width information for annotation instance
 // add annotation to first page of PDF file
 // save resultant file"c:/pdftest/Annotation_Rotate.pdf");

Miscellaneous Fixes

As well as the enhancements and features discussed above, there have been specific improvements regarding Text extraction from PDF, image placement inside PDF, PDF to HTML, HTML to PDF, PDF to PDF/A, XPS to PDF, PDF to Image, PDF to TIFF, TIFF to PDF conversion and PDF printing are also improved. Please download and try the latest release of Aspose.PDF for Java 11.4.0.