Convert PDF to DOCX in Java

PDF to DOCX

The PDF (Portable Document Format) files are widely popular due to their capabilities of ensuring document fidelity when viewing over any platform. They are also capable of encapsulating almost any type of data including text, tables, raster and vector graphics, video, audio, and also support a wide range of formatting features. Another aspect of their popularity is that these files can be opened in most modern browsers like Chrome, Safari, Firefox via extensions/plug-ins and you do not need to install any particular viewing application.

However, in order to edit/update the existing PDF files, you need to have a PDF processing application installed on your system which incurs time and licensing cost. Furthermore, in most cases, the official documents are also produced in MS Word (DOCX, DOC, etc) formats because it’s easy to modify the MS Word files. So in this article, we are going to discuss the options on how to convert PDF files to DOCX in the Java programming language.

MS Word processing REST API

Aspose.Words Cloud is our award-winning REST-based API offering the capabilities to create, edit and render MS Word files to other supported formats. It also offers an exciting feature to load PDF documents and save the output MS Word formats including DOCX, DOC, DOCM, DOTX, DOTM, etc.

Convert PDF to DOCX using the cURL command

Out Cloud API is in accordance with REST API principles, so it is easily accessible through the cURL command over the terminal. However, in order to access the Cloud API, you need to generate a JWT access token as the APIs are only accessible to authorized users. So you need to visit Aspose.Cloud dashboard website. If you have GitHub or Google account, simply Sign Up. Otherwise, click on the Create a new Account button and provide the required information.

Now login to the dashboard and expand the Applications section from the dashboard and scroll down towards the Client Credentials section to see Client ID and Client Secret details. The next step is to create a JWT access token so that the APIs can be accessed through the terminal.

curl -v "https://api.aspose.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=c235e685-1aab-4cda-a95b-54afd63eb87f&client_secret=b8da4ee37494f2ef8da3c727f3a0acb9" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

For more information, please visit How to Obtain JWT token using a Client ID and Client Secret key.

Given below is the cURL command to perform PDF to DOCX conversion of file already stored in Cloud storage.

curl -v "https://api.aspose.cloud/v4.0/words/demo.pdf/saveAs" \
-X PUT \
-d "{'SaveFormat':'docx', 'FileName': 'Converted.docx'}" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Authorization: Bearer <jwt token>"

Convert PDF to DOCX in Java

In order to use Java Cloud SDK, the first step is to install Aspose.Words Cloud SDK for Java. The Cloud SDK is available for download over Maven and GitHub. Given below are the details on how to download and use Aspose.Words.jar in the Maven build project.

Add the following dependencies in your pom.xml file.

<repositories>
 <repository>
        <id>aspose-cloud</id>
        <name>artifact.aspose-cloud-releases</name>
        <url>http://artifact.aspose.cloud/repo</url>
    </repository>   
</repositories>
<dependencies>
 <dependency>
        <groupId>com.aspose</groupId>
        <artifactId>aspose-words-cloud</artifactId>
        <version>21.4.0</version>
    </dependency>
 </dependencies>

The aspose-words-cloud-21.4.0.jar appears under the Maven Dependencies folder.

aspose.words.jar preview
Pic 1 :- Aspose.Words Cloud SDK for Java referenced in project.

Given below are the steps to load and convert PDF documents to DOCX format using Java programming language:

  • First of all, we need to specify clientID and clientSecret details
  • Secondly, create an object of ApiClient class where we pass ClientID and ClientSecret as arguments to the constructor. The third argument is baseUrl and by default, it points to https://api.aspose.cloud/
  • Thirdly, create an object of WordsApi class that accepts the ApiClient object as an argument to the constructor
  • Now we need to create an instance of SaveOptionsData class
  • Specify output file format details using SaveOptionsData.saveFormat(…) method
  • The name of the resultant file is specified using SaveOptionsData.fileName(…) method
  • Penultimate, we need to create an object of SaveAsRequest class where we provide an input file name and SaveOptionsData as arguments
  • Finally, call the wordsApi.saveAs(…) method where we provide SaveAsRequest instance as argument and conversion operation is initiated

The sample files used in above examples can be downloaded from:

Conclusion

In this article, we have discussed very simple steps to convert PDF files to DOCX format through cURL commands over the terminal and also, using code snippet in the Java programming language. Nevertheless, our Cloud SDKs are open source, so a complete source code can be downloaded from GitHub. We also recommend visiting product Documentation for details regarding other exciting features being offered by the API.

Last but not least. Your feedback is very important to us. Please feel free to contact us using our Support Forums.