PDF is one of the most commonly used formats for sending the document out to third parties. The reason behind this popularity is PDF’s compatibility across multiple platforms regardless of any hardware/software requirements. However, in some cases, you would want to convert the PDF document into an editable document format. PDF to Word DOC or DOCX could be the priority conversion option in such cases. To automate the conversion process, this article will showcase how to convert PDF to Word programmatically in Java.
So in this article, you will get to know how to:
- Convert PDF to DOC using Java.
- Convert PDF to DOCX using Java.
- Convert PDF to Word (DOC/DOCX) with additional options.
API for PDF to Word Conversion in Java
Thanks to Aspose.PDF for Java – a PDF manipulation Java API that provides easy ways to convert PDF files to a variety of other formats including Word (DOC/DOCX). You can download and add API’s JAR file to your project or reference it using the following Maven configurations:
<repository> <id>AsposeJavaAPI</id> <name>Aspose Java API</name> <url>https://repository.aspose.com/repo/</url> </repository>
<dependency> <groupId>com.aspose</groupId> <artifactId>aspose-pdf</artifactId> <version>19.12</version> </dependency>
Convert PDF to DOC using Java
Once you have referenced Aspose.PDF for Java in your application, you can convert any PDF document to DOC format in a couple of lines of code. The following are the steps required to perform this conversion.
- Create an instance of the Document class and initialize it with the input PDF file’s path.
- Call Document.save() method with the output DOC file’s name and SaveFormat.Doc arguments.
The following code sample shows how to convert PDF to DOC in Java.
Input PDF Document
Output Word Document
Convert PDF to DOCX using Java
DOCX is a well-known format for Word documents and in contrast to the DOC format, the structure of DOCX was based on the binary as well as the XML files. In case you want to convert PDF to DOCX format, you can tell the API to do so using the SaveFormat.DocX argument in Document.save() method.
The following code sample shows how to convert PDF to DOCX in Java.
Additional Options for PDF to Word Conversion
Aspose.PDF for Java also provides some additional options that you can use in PDF to Word conversion, such as the output format, image resolution, distance between text lines and so on. DocSaveOptions class is used for this purpose and the following is the list of options you can use:
- setFormat(int value) – To set the output format (Doc, Docx, etc.).
- setAddReturnToLineEnd(boolean value) – To add the paragraph or line breaks.
- setImageResolutionX(int value) – To set the X resolution for the images.
- setImageResolutionY(int value) – To set the Y resolution for the images.
- setMaxDistanceBetweenTextLines(float value) – To group text lines into paragraphs.
- setMode(int value) – To set recognition mode.
- setRecognizeBullets(boolean value) – To switch the recognition of bullets on.
- setRelativeHorizontalProximity(float value) – To set the width of space between different text elements in the input PDF file.
The following code sample shows how to use DocSaveOptions class in PDF to DOCX conversion using Java.