Nayyer Shahbaz August 24, 2015one Comment

PDF to PDF/A-3a conversion, Create PDF/A_3a and attach XML, Remove or manipulate tables in existing PDF file with Aspose.Pdf for Java 10.6.0

PDF to PDF/A-3a conversion, Create PDF/A_3a and attach XML, Remove or manipulate tables in existing PDF file with Aspose.Pdf for Java 10.6.0

August 24, 2015
Share on FacebookTweet about this on TwitterShare on LinkedIn

Aspose.Pdf for .NET logoIn every new release, we closely analyze our customers requirements and focus even towards minor details, so that we come up with features which produce remarkable outputs and bring ease to their life by eliminating the hassle of writing huge lines of code. All this can be accomplished using a single API instead of numerous components/softwares. Empowering the API with new rich features and enhancements, a new release of Aspose.Pdf for Java 10.6.0 has been published. This version contains some amazing new features which enrich the API to create stunning applications with vast variety of PDF creation as well as manipulation features. Astonish your customers through your applications by providing stunningly amazing features for PDF file creation/manipulation and surprise them with resultant files with great fidelity. The ease of use, extensive documentation and free technical support are some of the salient features of our API’s and we always strive to meet our customer’s expectations because we believe customer satisfaction is our Quality . Indeed we have taken the responsibility of harder parts and provide you the API’s which provide out of the box features and have incredible capabilities to generate the output with even couple of code lines. Like always, the new release is also empowered with some new features and enhancements.

PDF to PDF/A-3 with compliance-level (3a, 3b)

PDF to PDF/A conversion and PDF/A compliance validation features have been supported by our API for quite sometime and from time to time, we introduce modifications, so that new enhancements are provided in these functionalities. The following code lines can help in converting PDF file to PDF/A_3a compliant format.

string inFile = "input.pdf";
// Open document
Document doc = new Document("d:\\input.pdf");
// Convert to PDF/A3 compliant document
doc.convert("d:\\file.log", PdfFormat.PDF_A_3A, ConvertErrorAction.Delete);
// Save resultant document
doc.save("d:\\output.pdf");

Create PDF/A-3 and attach XML file

Aspose.Pdf for Java offers the feature to convert PDF files to PDF/A format and it also supports the capabilities of adding files as attachment to PDF document. In case you have a requirement to attach files to PDF/A compliance format, then we recommend using PDF_A_3A value from com.aspose.pdf.PdfFormat enumeration, as according to this post in Adobe community, PDF/A_3a is the format which provides the feature to attach any file format as attachment to PDF/A complaint file. However once the file is attached, you should convert it into Pdfa-3a format again, in order to fix metadata. Please take a look over following code snippet.

// instantiate Document instance
Document doc = new Document();
// add page to PDF file
doc.getPages().add();
// load XML file
FileSpecification fileSpecification = new FileSpecification(myDir + "attachment.xml", "Sample xml file");
// Add attachment to document's attachment collection
doc.getEmbeddedFiles().add(fileSpecification);	    
// perform PDF/A_3a conversion
doc.convert(myDir + "log.xml", PdfFormat.PDF_A_3A/*or PDF_A_3B*/, ConvertErrorAction.Delete);
// save final PDF file
doc.save(myDir+"attached_PDFA_3A.pdf");

Manipulate tables in existing PDF document

One of the earliest features supported by Aspose.Pdf for Java is its capabilities of Working with Tables and it provides great support for adding tables in PDF files being generated from scratch or any existing PDF files. You also get the capability to dynamically create tables and place them inside PDF file. Starting this release, a new feature of searching and parsing simple tables that already exist in PDF document has been provided. A new class named com.aspose.pdf.TableAbsorber provides these capabilities. The usage of TableAbsorber is very much similar to existing TextFragmentAbsorber class.

//load existing PDF file
Document pdfDocument = new Document("c:/pdftest/table.pdf");
// Create TableAbsorber object to find tables
com.aspose.pdf.TableAbsorber absorber = new com.aspose.pdf.TableAbsorber();

// Visit first page with absorber
absorber.visit(pdfDocument.getPages().get_Item(1));

// Get access to first table on page, their first cell and text fragments in it
TextFragment fragment = absorber.getTableList().get_Item(0)
				.getRowList().get_Item(0)
				.getCellList().get_Item(0)
				.getTextFragments().get_Item(1);

// Change text of the first text fragment in the cell
fragment.setText ("Hello World !");
// save updated document
pdfDocument.save("c:/Table_Manipulated.pdf");

Features related to this functionality which still need implementation.

  • One of the customers has requested to fetch the data based on the blocks of table or borders (as given in the diagram) and colors as well.
  • Currently TableAbsorber cannot recognize table cell background color now. However we expect to make this improvement in this future and a separate ticket is already created in our issue tracking system.
  • Another customer wants to get contents of column in the table. Currently TableAbsorber cannot recognize table without borders, but conversion to XLS works well in such cases. However conversion to XLS is a workaround. An enhancement ticket has been logged to improve TableAbsorber for working with such table types.
  • A Customer wants to update table in existing PDF dynamically. Including deleting / insertion of rows. This request is a bit difficult to implement and current implementation of TableAbsorber cannot fulfill such requirements.
  • If you have a requirement of looking for text property in com.aspose.pdf.cells or BaseParagraph types, (such types are designed for adding new contents on the page), you must cast BaseParagraph to one of the inherited types. For example next code must help:
    for(com.aspose.pdf.Row row : (Iterable)table.getRows())
    {
        TextFragment updatedfragment = (com.aspose.pdf.TextFragment) row.getCells().get_Item(1).getParagraphs().get_Item(1);
        String text;
        if (updatedfragment != null)
            text = updatedfragment.getText();
    }
    

Remove tables from existing PDF

Aspose.Pdf for Java offers the capabilities to insert/create Table inside PDF document while its being generated from scratch or you can also add the table object in any existing PDF document. However you may have a requirement to Manipulate Tables in existing PDF where you can update the contents in existing table cells. However you may come across a requirement to remove table objects from existing PDF document. Please note that in order to remove the tables, we need to use TableAbsorber class to get hold of tables in existing PDF and then replace the table cell contents with blank characters and in order to remove the border, certain page region is redacted. The following code snippet shows the steps to delete table from existing PDF document.

com.aspose.pdf.facades.PdfAnnotationEditor editor = new com.aspose.pdf.facades.PdfAnnotationEditor();
editor.bindPdf(myDir+"table2.pdf");

// Create TableAbsorber object to find tables
TableAbsorber absorber = new TableAbsorber();

// Visit first page with absorber
absorber.visit(editor.getDocument().getPages().get_Item(1));

// Getting the table rectangle
com.aspose.pdf.Rectangle rect = absorber.getTableList().get_Item(0).getRectangle();

// clear text for the table
for (AbsorbedRow row : absorber.getTableList().get_Item(0).getRowList()) {
    for (AbsorbedCell cell : row.getCellList()) {
        for (Object fragment : cell.getTextFragments()) {
            ((TextFragment)fragment).setText("");
        }
    }
}

//Need to add a pixel to delete the border
rect.setLLX(rect.getLLX()-1);
rect.setLLY(rect.getLLY()-1);
rect.setURX(rect.getURX()+1);
rect.setURY(rect.getURY()+1);


editor.redactArea(1, rect, java.awt.Color.WHITE);
editor.save(myDir+"out_table_deleted.pdf");

Miscellaneous fixes

As well as the enhancements and features discussed above, there have been specific improvement for PDF to HTML and HTML to PDF conversion features with better support for HTML5. Among these fixes, the PCL to PDF, SVG to PDF, PDF to Excel, PDF to DOC, PDF to TIFF and TIFF to PDF conversion, conversion of PDF to PDF/A compliant documents, text replacement, Filling of signature field with an image, flattening of PDF and rendering of PDF to XPS format, FloatingBox rendering, FootNote, EndNote and rendering of non-English (specifically Arabic) contents are also improved. Please download and try the latest release of Aspose.Pdf for Java 10.6.0.

Join the Conversation

1 Comment

Leave a comment

Posted inAspose.PDF Product Family, Nayyer Shahbaz
 

Related Articles