PDF zuwa TXT Converter

Yadda ake canza PDF zuwa TXT ta amfani da Java

Fayil PDF yawanci yana kunshe da Rubutu, Hoto, Take, Bayani da sauran abubuwa. Kuma kamar yadda wannan tsarin ke adana shimfidar daftarin aiki a kan dandamali (Desktop / Mobile da sauransu), don haka ana amfani dashi da yawa don raba bayanai akan intanet. Koyaya, ƙila mu sami buƙatu don fitar da abun ciki na rubutu na takaddar PDF don ƙarin aiki. Don haka a cikin wannan labarin, za mu tattauna cikakkun bayanai kan yadda ake cire rubutu daga PDF ta amfani da Java Cloud SDK. Da zarar aikin ya cika, ana adana kayan aikin a cikin tsarin TXT.

API ɗin Canjin PDF zuwa TXT

Aspose.PDF Cloud SDK don Java shine lambar yabo ta mu ta lashe maganin REST API wanda ke ba da damar ƙirƙira, gyarawa da canza PDF zuwa JPG, XPS, HTML, DOCX da sauran nau’ikan nau’ikan tsararrun tallafi. Yanzu don aiwatar da iyawar fahimtar rubutu na pdf a cikin aikace-aikacen Java, da fatan za a ƙara cikakkun bayanai a cikin pom.xml na nau’in gini na maven.

<repositories> 
    <repository>
        <id>aspose-cloud</id>
        <name>artifact.aspose-cloud-releases</name>
        <url>https://artifact.aspose.cloud/repo</url>
    </repository>   
</repositories>

<dependencies>
    <dependency>
        <groupId>com.aspose</groupId>
        <artifactId>aspose-pdf-cloud</artifactId>
        <version>21.11.0</version>
    </dependency>
</dependencies>

Bayan shigarwar SDK, muhimmin mataki na gaba shine ƙirƙirar asusun kyauta akan Aspose Cloud. Don haka da fatan za a shiga ta amfani da sabon asusun da aka ƙirƙira kuma bincika/ƙirƙiri ID na abokin ciniki da Sirrin Abokin ciniki a Cloud Dashboard. Ana buƙatar waɗannan cikakkun bayanai a cikin sassan da ke gaba.

PDF zuwa Rubutu a Java

Da fatan za a bi matakan da aka bayar a ƙasa don yin fassarar PDF zuwa Rubutu ta amfani da Java Cloud SDK. Don haka bayan nasarar juyin juya hali, ana adana sakamakon TXT a cikin ma’ajin gajimare.

  • Da farko muna buƙatar ƙirƙirar abu na PdfApi yayin samar da ClientID da sirrin abokin ciniki azaman muhawara
  • Na biyu, shigar da fayil ɗin PDF ta amfani da misalin Fayil
  • Loda shigar da PDF zuwa ma’ajiyar gajimare ta amfani da hanyar uploadFile(…).
  • Ƙirƙirar madaidaicin lamba ta PDF don cire rubutu da kuma misalai biyu masu nuna yankin shafi na rectangular daga ciki wanda muke buƙatar cire abun ciki na Rubutu.
  • A ƙarshe kira hanyar getPageText(…) don ɗauko abun ciki na rubutu daga shigar da PDF
try
    {
    // Samu ClientID da ClientSecret daga https://dashboard.aspose.cloud/
    String clientId = "bb959721-5780-4be6-be35-ff5c3a6aa4a2";
    String clientSecret = "4d84d5f6584160cbd91dba1fe145db14";
	  
    // ƙirƙirar misali na PdfApi
    PdfApi pdfApi = new PdfApi(clientSecret,clientId);

    // sunan shigar da takaddun PDF
    String inputFile = "marketing.pdf";

    // karanta abun ciki na shigar da fayil ɗin PDF
    File file = new File("//Users//"+inputFile);
	    
    // loda PDF zuwa ma'ajiyar gajimare
    pdfApi.uploadFile("input.pdf", file, null);

    // takamaiman shafi na PDF don canzawa
    int pageNumber =1;

    // Haɗin X na ƙananan - kusurwar hagu
    Double LLX = 0.0;
    // Y - daidaitawa na ƙananan-kusurwar hagu.
    Double LLY = 0.0;
    // X - daidaitawar kusurwar sama-dama.
    Double URX = 800.0;
    // Y - daidaitawar kusurwar sama-dama.
    Double URY = 800.0;
	       
    // kira API don Maida PDF zuwa Rubutu
    TextRectsResponse response = pdfApi.getPageText(inputFile, pageNumber, LLX, LLY, URX, URY, null, null, true, null, "default");	    
    
    // sakamakon TXT fayil misali
    FileWriter myWriter = new FileWriter("filename.txt");
  
    // Yanzu zazzage ta hanyar Farkon Rubutun mutum ɗaya sami sakamakon buga a cikin na'ura wasan bidiyo
    for(int counter=0; counter <=response.getTextOccurrences().getList().size()-1; counter++)
    {
        // rubuta abun ciki na rubutu zuwa fayil na TXT
	myWriter.write(response.getTextOccurrences().getList().get(counter).getText());
    }
  
    // rufe mai sarrafa TXT
    myWriter.close();
    
    System.out.println("Text successfully extracted from PDF !");
    }catch(Exception ex)
    {
	      System.out.println(ex);
    }
Maida PDF zuwa TXT

Hoto 1: - PDF zuwa TXT Preview Preview

Za a iya sauke samfurin fayil ɗin PDF da aka yi amfani da shi a sama misali daga marketing.pdf da extracted.txt

Cire Rubutu daga PDF ta amfani da Umarnin CURL

Ana iya samun damar Th REST APIs cikin sauƙi ta umarnin cURL, don haka a cikin wannan sashe, za mu bincika zaɓi na yadda za mu iya cire abun ciki na Rubutu daga PDF ta amfani da umarnin cURL. Don haka a matsayin buƙatun farko, da farko muna buƙatar samar da alamar samun damar JWT (bisa ga shaidar abokin ciniki) yayin aiwatar da umarni mai zuwa.

curl -v "https://api.aspose.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=bb959721-5780-4be6-be35-ff5c3a6aa4a2&client_secret=4d84d5f6584160cbd91dba1fe145db14" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

Da zarar mun sami alamar JWT, muna buƙatar aiwatar da umarni mai zuwa don cire duk abubuwan da ke faruwa a cikin takaddar PDF.

curl -v -X GET "https://api.aspose.cloud/v3.0/pdf/input.pdf/text?splitRects=true&LLX=0&LLY=0&URX=800&URY=800" \
-H  "accept: application/json" \
-H  "authorization: Bearer <JWT Token>"

Kammalawa

Wannan labarin ya bayyana cikakkun bayanai kan yadda ake canza PDF zuwa TXT ta amfani da Java Cloud SDK. A lokaci guda kuma, mun kuma bincika zaɓuɓɓukan cire rubutu daga PDF ta amfani da umarnin cURL. Don haka tare da sassaucin ratsawa tsakanin shafuka masu yawa na PDF, muna samun iko akan inda zamu cire abun ciki. Muna ba ku shawarar sosai don bincika samfurin Takardu don ƙarin koyo game da sauran abubuwan ban sha’awa da Java Cloud API ke bayarwa. Hakanan, kamar yadda ake buga duk SDK ɗin mu a ƙarƙashin lasisin MIT, don haka kuna iya yin la’akari da zazzage cikakkiyar lambar tushe daga GitHub kuma gyara ta gwargwadon buƙatunku. A cikin kowane matsala, kuna iya la’akari da kusantar mu don ƙuduri mai sauri ta hanyar [Tallafin tallafin samfur 9.

Labarai masu alaka

Da fatan za a ziyarci hanyoyin haɗin yanar gizon don ƙarin koyo game da: