Convert PDF to Word using Python

PDF is one of the widely used file formats for information sharing. It’s popular due to the fact that it preserves document fidelity on all platforms and all devices (desktop, mobile, etc). However, if we need to make any changes to the PDF file, we need to use specific applications to open and edit PDF documents. But for a large number of updates, the conversion of PDF files to Word documents is one viable solution. Also, for bulk conversion, a programming SDK is an effective solution. In this article, we are going to discuss the conversion of PDF to Word using Python SDK.

Word Processing API

Aspose.Words Cloud is our award-winning REST-based API offering the capabilities to create, edit and transform Word files HTML, JPEG, PNG, and other supported file formats. At the same time, it also supports the capabilities to load PDF documents and render them to MS Word (DOCX, DOC, DOT, RTF, DOCM) or OpenDocument (ODT, OTT). In order to perform this conversion, no third-party software download or installation is required, and perform all the conversion using our document processing engine in Cloud. Now in order to implement the document conversion operation within the Python application, you need to try using Aspose.Words Cloud SDK for Python, which is a wrapper around Cloud API.

Installation

The SDK is available for download at PIP and GitHub. Execute the following command on the command line terminal to install the SDK

pip install aspose-words-cloud

PyCharm IDE

If you are using PyCharm IDE, you may directly add the SDK as a dependency in your project.

File -> Settings -> Project -> Python Interpreter -> asposewordscloud

PyCharm settings
Image 1:- PyCharm settings option.
Aspose.Words Python package
Image 2:- Aspose.Words Python Package.

Convert PDF to Word in Python

Please follow the instructions below to perform the conversion of the PDF file to Word format.

  • First we need to create ApiClient object while passing ClientID and ClientSecret details as arguments
  • Secondly, create an instance of WordsApi while passing ApiClient instance as argument
  • Thirdly, upload PDF file to Cloud storage using UploadFileRequest(..) method
  • Now create an object of SaveOptionsData object where we define docx as export format
  • Next step is to create an instance of SaveAsRequest which takes PDF file name and SaveOptionsData object as arguments
  • Finally, call the save_as(..) of WordsApi class to perform the conversion operation

The sample files used in the above code snippet can be downloaded from awesome_table_in_pdf.pdf and Converted.docx.

PDF to Word using cURL Command

Like other REST APIs, Aspose.Words Cloud can also be accessed via cURL commands. But before we proceed towards API access, we need to generate a JWT access token based on Client Credentials specified over Aspose.Cloud dashboard. Please execute the following cURL command to generate the JWT access token.

curl -v "https://api.aspose.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=4ccf1790-accc-41e9-8d18-a78dbb2ed1aa&client_secret=caac6e3d4a4724b2feb53f4e460eade3" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

Now we can use the following command to convert PDF files available in Cloud storage to Word format. In the following command, we have used the -o parameter to save output on the local drive.

curl -X GET "https://api.aspose.cloud/v4.0/words/awesome_table_in_pdf.pdf?format=docx" \
-H  "accept: application/octet-stream" \
-H  "Authorization: Bearer <JWT Token>" \
-o Converted.docx

Please use the following command if you need to directly save the output word document in Cloud storage. Please notice the outPath request parameter in the following command.

curl -X GET "https://api.aspose.cloud/v4.0/words/awesome_table_in_pdf.pdf?format=docx&outPath=newResultant.docx" \
-H  "accept: application/octet-stream" \
-H  "Authorization: Bearer <JWT Token>"

Conclusion

In this article, we have explored the amazing capabilities of Aspose.Words Cloud regarding PDF to Word format conversion. In order to test the API, you may directly access it within a web browser using the Swagger interface. Furthermore, the Cloud SDK is developed under the MIT license, so its complete source code is available over the GitHub repository.

In case you encounter any issues while using the API or you have any related queries, please contact us via a free product support forum.

Related Articles

We recommend visiting the following links to learn more about