Convert PDF to Excel in Python – PDF to XLSX in Python

Excel to PDF in Python

PDF format is quite popular for document sharing over the internet because it preserves document formatting on any platform. Even the fidelity of the document is not compromised when using different versions of PDF reader software. But in order to edit the PDF file, specific applications such as Adobe Acrobat, etc are required and some of them are quite expensive. Also, if the PDF file contains computation data, it becomes quite cumbersome to manually copy all the content and generate a spreadsheet file from the beginning. So a viable solution is the conversion of PDF files to Excel format.

Excel files are still the most preferred file format for computational data sharing around the world. Another common reason to use spreadsheets is to store and organize data, like revenue, payroll, and accounting information. They allow the user to make calculations with this data and to produce graphs and charts. So in this article, we are going to the discuss following topics in detail:

PDF Manipulation API

Aspose.PDF Cloud is specifically created to provide PDF files creation and manipulations capabilities. It is developed as no third-party software installation is required. It also enables you to render a PDF file to HTML, EPUB, XLSX, DOCX, PPTX, and many other supported file formats. No other software download or installation is required and perform all the document conversion in the Cloud. Now in order to further facilitate our customers, we have created specific programming language wrappers around Cloud API so that you get all the benefits of document processing right within the language of your choice.

Installation

In this article, we are going to discuss the conversion of PDF files to Excel in Python therefore, we need to first install Aspose.PDF Cloud SDK for Python. It is available for download over PIP and GitHub repository. Execute the following command on the terminal/command prompt to install the latest version of SDK on the system.

pip install asposepdfcloud

MS Visual Studio

In case you need to directly add the reference in your Python project within Visual Studio IDE, please search asposepdfcloud as a package under the Python environment window. Please follow the steps numbered in the image below to complete the installation process.

Aspose.PDF Cloud Python
Image 1:- Aspose.PDF Cloud SDK for Python package.

PyCharm

PyCharm is a popular IDE for Python development. In this section, we are going to discuss PyCharm settings on the Windows platform.

  • Click File menu and select Settings… menu item.
PyCharm settings
Image 2:- PyCharm Settings menu item.
  • Expand the Project tree from the left and select the Python Interpreter option.
  • Click the + (plus) sign on the right section and enter asposepdfcloud in the search field over the available packages dialog.
  • Now click the Install Package button.
Aspose.PDF Cloud for Python package
Image 3:- Aspose.PDF Cloud for Python package.

Once the SDK is installed, the success message is displayed.

Python package installed
Image 4:- Success message once Aspose.PDF Cloud for Python is installed.

Aspose.Cloud Dashboard

In order to get started with Cloud APIs, we need to create an account on Aspose.Cloud dashboard. If you have GitHub or Google account, simply Sign Up or, click on the Create a new Account button and provide the required information. Now login to the dashboard using credentials and expand the Applications section from the dashboard and scroll down towards the Client Credentials section to see Client ID and Client Secret details.

Client Credentials
Image 5:- Client credentials on Aspose.Cloud dashboard.

PDF to Excel in Python

Please follow the instructions below to convert a PDF file to an Excel workbook (XLSX) using a Python code snippet. Please note that follow code snippets expects the input PDF to be available in cloud storage.

  • First, we need to create an instance of ApiClient class while providing Client ID Client Secret as arguments
  • Secondly, create an instance of PdfApi class which takes ApiClient object as input argument
  • Now specify the name of input PDF and resultant XLSX file name
  • Finally, call the put_pdf_in_storage_to_xlsx(..) method which takes input PDF file, resultant XLSX file name and an optional parameter to generate uniform worksheets.
PDF to XLSX preview
Image 6:- PDF to Excel conversion preview.

The sample files used in the above example can be downloaded from awesomeTable.pdf and Resultant.xlsx.

Convert PDF to XLSX using cURL Command

The REST APIs can also be accessed via cURL commands. The amazing fact about the cURL commands is that you can access them on any platform even within a command-line terminal. So in the following section, we are going to discuss the details on how to convert a PDF file to XLSX format using the cURL command.

The first step is to generate a JSON Web Token (JWT) based on your individual client credentials specified over Aspose.Cloud dashboard. It is mandatory because our APIs are only accessible to registered users. Please execute the following command to generate the JWT token.

curl -v "https://api.aspose.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=88d1cda8-b12c-4a80-b1ad-c85ac483c5c5&client_secret=406b404b2df649611e508bbcfcd2a77f" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

Once we have the JWT token, please execute the following command to perform the conversion operation.

curl -v -X PUT "https://api.aspose.cloud/v3.0/pdf/awesomeTable.pdf/convert/xlsx?outPath=Converted.xlsx&uniformWorksheets=true" \
-H  "accept: application/json" \
-H  "authorization: Bearer <JWT Token>"

Conclusion

In this article, we have discussed the details on how we can programmatically and through cURL commands, transform our PDF files to Excel format. The conversion has been so amazing that even the minor details including table structure, character encoding have been preserved. Furthermore, if you are interested to convert a PDF file to XLSX format and want to receive the resultant file in response context, please try using GetPdfInStorageToXlsx API.

Please note that as our Cloud SDKs are developed under MIT license, so their complete code snippet is available for free download over GitHub. Should you have any related queries or you encounter any issues while using our APIs, please feel free to contact us via the free customer support forum.

Related Articles

We recommend visiting the following articles to learn about: