PDF OCR

PDF ana amfani da fayiloli sosai akan intanit don bayanai da raba bayanai. Suna da shahara sosai saboda suna kiyaye amincin takardu lokacin dubawa akan kowane dandamali. Koyaya, ba mu da iko akan tushen kuma ana raba wasu fayiloli a cikin sigar dubawa. Wani lokaci kuna ɗaukar hoto azaman PDF kuma daga baya kuna buƙatar cire abun ciki daga fayil ɗin. Don haka mafita mai inganci ita ce yin aikin OCR da cire rubutun. Koyaya, bayan aikin OCR, idan kuna buƙatar adana fayil ɗin, to juyawa zuwa tsarin PDF shine mafita mai yuwuwa. A cikin wannan labarin, za mu tattauna matakan yadda ake canza PDF ɗin da aka bincika zuwa Rubutun PDF ta amfani da Python.

API ɗin OCR PDF

Aspose.PDF Cloud SDK don Python abin rufewa ne a kusa da Aspose.PDF Cloud. Yana ba ku damar aiwatar da duk damar sarrafa fayilolin PDF a cikin aikace-aikacen Python. Sarrafa fayilolin PDF ba tare da Adobe Acrobat ko wani aikace-aikacen ba. Don haka don amfani da SDK, mataki na farko shine shigarwa, kuma yana samuwa don saukewa akan PIP da GitHub ma’aji. Yanzu aiwatar da umarni mai zuwa akan tasha / umarni da sauri don shigar da sabuwar sigar SDK akan tsarin.

 pip install asposepdfcloud

MS Visual Studio

Hakanan kuna iya ƙara bayanin kai tsaye a cikin aikin Python ɗinku a cikin aikin Kayayyakin Kayayyakin Kayayyakin Kayayyakin. Da fatan za a bincika asposepdfcloud azaman fakiti a ƙarƙashin taga mahallin Python. Da fatan za a bi matakai masu lamba a hoton da ke ƙasa don kammala aikin shigarwa.

Aspose.PDF Cloud Python

Hoto 1:- Aspose.PDF Cloud SDK don kunshin Python.

Aspose.Cloud Dashboard

Tunda APIs ɗin mu suna samun dama ga mutane masu izini kawai, don haka mataki na gaba shine ƙirƙirar lissafi akan Aspose.Cloud dashboard. Idan kuna da asusun GitHub ko Google, kawai Yi rajista ko, danna maɓallin Ƙirƙiri sabon Asusu kuma samar da bayanan da ake buƙata. Yanzu shiga cikin dashboard ta amfani da takaddun shaida kuma fadada sashin aikace-aikacen daga dashboard kuma gungura ƙasa zuwa sashin Shaidar Abokin ciniki don ganin bayanan Abokin ciniki da bayanan Sirrin Abokin ciniki.

Takaddun shaida na abokin ciniki

Hoto 2:- Takardun shaidar abokin ciniki akan dashboard Aspose.Cloud.

Hoton PDF zuwa PDF da ake nema a Python

Da fatan za a bi matakan da aka bayar a ƙasa don yin aikin OCR akan takaddun PDF da aka bincika sannan a adana shi azaman abin nema (maimaita pdf ɗin bincike). Waɗannan matakan suna taimaka mana don haɓaka OCR akan layi kyauta ta amfani da Python.

  • Da farko, muna buƙatar ƙirƙirar misali na aji ApiClient yayin samar da Asirin Abokin Ciniki na Abokin ciniki azaman muhawara.
  • Na biyu, ƙirƙiri misali na ajin PdfApi wanda ke ɗaukar abu ApiClient azaman hujjar shigarwa
  • Yanzu kira hanyar putsearchabledocument(..) na ajin PdfApi wanda ke ɗaukar shigar da sunan PDF da siga na zaɓi wanda ke nuna harshen injin OCR.
def ocrPDF():
    try:
        #Client credentials
        client_secret = "406b404b2df649611e508bbcfcd2a77f"
        client_id = "88d1cda8-b12c-4a80-b1ad-c85ac483c5c5"

        #initialize PdfApi client instance using client credetials
        pdf_api_client = asposepdfcloud.api_client.ApiClient(client_secret, client_id)

        # ƙirƙirar misalin PdfApi yayin wucewa PdfApiClient azaman hujja
        pdf_api = PdfApi(pdf_api_client)

        #input PDF file name
        input_file = 'image-based-pdf-sample.pdf'

        # kira API don yin aikin OCR kuma adana abin fitarwa a cikin ajiyar girgije
        response = pdf_api.put_searchable_document(name=input_file,lang='eng')

        # buga sakon a cikin na'ura mai kwakwalwa (na zaɓi)
        print('Image PDF successfully converted to Text PDF !')    
    except ApiException as e:
        print("Exception while calling PdfApi: {0}".format(e))
        print("Code:" + str(e.code))
        print("Message:" + e.message)
Binciken OCR na PDF

Hoto na 3:- Duban aikin PDF OCR.

A cikin hoton da ke sama, ɓangaren hagu yana nuna shigar da fayil ɗin PDF da aka bincika kuma ɓangaren da ke gefen dama yana nuna samfoti na tushen rubutun PDF. Fayilolin samfurin da aka yi amfani da su a cikin misalin da ke sama za a iya sauke su daga image-based-pdf-sample.pdf da OCR-Result.pdf.

OCR akan layi ta amfani da Umarnin CURL

Hakanan ana iya samun dama ga REST APIs ta umarnin cURL kuma kamar yadda API ɗin Cloud ɗinmu suka dogara akan gine-ginen REST, don haka za mu iya amfani da umarnin cURL don yin PDF OCR akan layi. Koyaya, kafin mu ci gaba da aikin juyawa, muna buƙatar samar da Token Yanar Gizo na JSON (JWT) dangane da takaddun shaidar abokin cinikin ku da aka ƙayyade akan dashboard na Aspose.Cloud. Ya zama dole saboda APIs ɗin mu suna samun isa ga masu amfani da rajista kawai. Da fatan za a aiwatar da umarni mai zuwa don samar da alamar JWT.

curl -v "https://api.aspose.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=88d1cda8-b12c-4a80-b1ad-c85ac483c5c5&client_secret=406b404b2df649611e508bbcfcd2a77f" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

Da zarar muna da alamar JWT, da fatan za a aiwatar da umarni mai zuwa don yin aikin OCR kuma adana abin da aka fitar a cikin ma’ajiyar girgije iri ɗaya.

curl -v -X PUT "https://api.aspose.cloud/v3.0/pdf/image-based-pdf-sample.pdf/ocr" \
-H  "accept: application/json" \
-H  "authorization: Bearer <JWT Token>"

Kammalawa

A cikin wannan labarin, mun tattauna matakan zuwa Hoton PDF zuwa PDF da ake nema ta amfani da snippet code na Python. Mun kuma bincika cikakkun bayanai kan yadda ake yin OCR akan layi ta amfani da umarnin CURL. Kamar yadda aka haɓaka SDK ɗin mu na girgije a ƙarƙashin lasisin MIT, don haka zaku iya zazzage cikakkiyar snippet lambar daga GitHub kuma sabunta shi gwargwadon buƙatun ku. Muna ba ku shawarar sosai don bincika Jagorar Haɓaka don ƙarin koyo game da wasu abubuwan ban sha’awa waɗanda Cloud API ke bayarwa a halin yanzu.

Idan kuna da wasu tambayoyi masu alaƙa ko kun ci karo da kowace matsala yayin amfani da APIs ɗin mu, da fatan za ku ji daɗin tuntuɓar mu ta [ dandalin tallafin abokin ciniki kyauta 13.

Labarai masu alaka

Muna kuma ba da shawarar yin bibiyar labarai na gaba don ƙarin koyo game da su