A quick and easy approach to extract pages from word documents using Python SDK.

split word document

split word document | Extract Pages from Word Document as a separate file

In the realm of document management, there often arises the need to divide, separate, or extract specific sections from a Word document. Whether you’re dealing with extensive research papers, comprehensive reports, or lengthy manuscripts, the task of breaking them down into more manageable parts can be both time-consuming and challenging. In this article, we will explore the steps for achieving this requirement with Python Cloud SDK, allowing you to streamline your document management tasks and work more efficiently.

Word Processing API

Aspose.Words Cloud is our dedicated solution for MS Word (DOCX, DOC, DOT, RTF, DOCM) or OpenDocument (ODT, OTT) processing. No third-party software or MS Office automation is necessary to process Word documents. Simply call the REST APIs to accomplish your requirements. Since the APIs are REST-based, so you can access them on any platform including Desktop, Web, Mobile App, etc. Now as per the scope of this article, we are going to discuss the details of how to split pages in a word file as an individual word document. The API also provides the flexibility to customize the split operation i.e. Split every page, odd and even, by the number of pages, by page range.

In order to further facilitate our customers, we have created Aspose.Words Cloud SDK for Python, which is a wrapper around Cloud API, so you can take all the benefits of Word document processing within your favorite programming language. So before proceeding further, the first step is the installation of SDK on the local system. It is available for download at PIP and GitHub. Execute the following command on the command line terminal to install the SDK:

pip install aspose-words-cloud

In case you are using Visual Studio as IDE, you may directly add the reference of SDK in the project.

Click View -> Other Windows -> Python Environments option. As shown below.

Python Environment menu option.

Image 1:- Python Environment menu option.

Enter aspose-word-cloud under Packages field in Python Environments window. Then click the Install aspose-word-cloud (21.11.0) link. The version number may change depending upon the latest/current release version. See the image below.

aspose-words-cloud python package

Image 2:- aspose-words-cloud python package.

Split Pages in Word Document using Python

Please follow the instructions below to split all the pages in a word document already available in cloud storage.

  • Firstly, we need to initialize an object of WordsApi while passing Client ID and Client Secret as arguments.
  • Secondly, specify the name of the input Word file, resultant output format, name of the resultant file, and parameter to zip archive the output.
  • Upload input Word document to cloud storage using UploadFileRequest object.
  • Now create an instance of SplitDocumentRequest while passing the details defined in the second step.
  • Finally, call the split_document(…) method of WordsApi class to split word documents. The resultant files are saved in mapped cloud storage.
Preview of Document Split operation

Image 3:- Preview of Document Split operation.

Split Document based on Selected Pages

In this section, we are going to discuss the details on how to split a document based on selected pages and save the output as a ZIP archive. The code snippet is almost the same as shared above except we need to specify the Page From, Page To and True value for output to be archived.

Document Split output

Image 4:- Preview of Document Split operation for selected pages.

Extract Pages from Word Document using cURL Commands

Like other REST APIs, Aspose.Words Cloud can also be accessed via cURL commands within the command line terminal. However, before proceeding further, we need to first generate a JWT access token based on Client credentials.

curl -v "https://api.aspose.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=88d1cda8-b12c-4a80-b1ad-c85ac483c5c5&client_secret=406b404b2df649611e508bbcfcd2a77f" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

Once the token has been generated, please execute the following command to extract pages from the word document and save the output in Cloud storage.

curl -v -X PUT "https://api.aspose.cloud/v4.0/words/source.doc/split?format=DOCX&destFileName=Split-File&from=2&to=4&zipOutput=false" \
-H  "accept: application/json" \
-H  "Authorization: Bearer <JWT Token>"

Conclusion

In this article, we have explored the possibility to create a document splitter that can split Word document into individual page files using Python SDK. Furthermore, as per your requirements, you may use the Python SDK or extract pages from Word document using cURL commands. Please note that we believe in collective growth and collaboration. Therefore, our SDKs are developed as per the MIT license and their complete source code is available for download over Github. If you need, you may download and modify the code as per your requirements. In case you encounter any issues or you have any further queries, please feel free to contact us via the Free product support forum.

We recommend you visit the following links to learn more about: