I am trying to automate a Numeric Captcha using Tesseract OCR in Mac OS (Monterey). But I am facing below error.
Reason:
java.lang.UnsatisfiedLinkError: Unable to load library 'tesseract': Native library (darwin/libtesseract.dylib) not found in resource path ([file:/Users/xyz/Katalon%20Studio/TestProject/bin/keyword/, file:/Users/xyz/Katalon%20Studio/TestProject/Keywords/, file:/Users/xyz/Katalon%20Studio/TestProject/bin/listener/,....])
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:277)
at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:403)
at com.sun.jna.Library$Handler.<init>(Library.java:147)
at com.sun.jna.Native.loadLibrary(Native.java:502)
at com.sun.jna.Native.loadLibrary(Native.java:481)
at net.sourceforge.tess4j.util.LoadLibs.getTessAPIInstance(LoadLibs.java:85)
at net.sourceforge.tess4j.TessAPI.<clinit>(TessAPI.java:42)
at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:427)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:223)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:195)
The same code is working fine on my Mac in Intellij and in windows OS. Issue occurs only in Katalon studio in Mac. Can anyone please help me to solve this.
I have referred java - Tess4j unsatisfied link error on mac OS X - Stack Overflow and I have added the darwin/libtesseract.dylib file into the Maven repo. After adding this, Tesseract OCR is working fine in Intellij. But still I am facing the UnsatisfiedLinkError error in Katalon.
I do not understand what you mean âinto the Maven repoâ. Do you mean âinto under the $HOME/.m2 folderâ?
I guess you project in IntelliJ is configured to refer to the Maven local repository, so your code works.
However, Katalon Studio does not understand Maven at all.
Katalon assumes that every external resourses to be located under the Drivers folder manually by you.
Perhaps it is the reason why your code does not work in Katalon Studio.
Thanks for your information @kazurayam . I meant the Maven local repository.
I added the libtesseract.dylib file under the Katalon project Drivers folder manually. Now I am facing java.lang.UnsatisfiedLinkError: Unable to load library 'tesseract': Can't obtain InputStream for darwin/libtesseract.dylib error .
Thank you @kazurayam for letting me know. I already asked our product team to help. @mohit.kumar will investigate more details that @hemalatha.mani provided and we will assist shortly. Thanks for your continued support.
Hi @hemalatha.mani for your follow-up. @mohit.kumar has tried to reproduce the issue but no luck so far. We have escalated this to our Product team for more information. Weâll update you once we have some progress. Best.
Finally, I was able to successfully read the content from your PDF file(Tested on Both MAC and Window). Please follow the steps below to achieve the same:
UseCase:
If the user is unable to extract content using the PDF plugin, an alternative approach is to convert the PDF into images, apply Optical Character Recognition (OCR) to extract text, save the extracted content into a text file, and then read and verify the expected values. Below is the implementation in code.
Step-by-Step Guide to Implement OCR for PDF Content Extraction
Convert PDF to Images: Render each page of the PDF as an image.
Apply OCR: Use OCR to extract text from these images.
Save Extracted Text: Save the extracted text into a .txt file.
Read and Verify: Read the text file and verify the expected values.
Implementation in Code
Download and Set Up Tesseract: Download and install Tesseract from here.
Install PDF Utility: Install the PDF utility from here.
Build a Test Case: Use the code provided below to build your test case.