Tesseract OCR is getting failed in Mac with UnsatisfiedLinkError

I am trying to automate a Numeric Captcha using Tesseract OCR in Mac OS (Monterey). But I am facing below error.

Reason:
java.lang.UnsatisfiedLinkError: Unable to load library 'tesseract': Native library (darwin/libtesseract.dylib) not found in resource path ([file:/Users/xyz/Katalon%20Studio/TestProject/bin/keyword/, file:/Users/xyz/Katalon%20Studio/TestProject/Keywords/, file:/Users/xyz/Katalon%20Studio/TestProject/bin/listener/,....])
   at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:277)
	at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:403)
	at com.sun.jna.Library$Handler.<init>(Library.java:147)
	at com.sun.jna.Native.loadLibrary(Native.java:502)
	at com.sun.jna.Native.loadLibrary(Native.java:481)
	at net.sourceforge.tess4j.util.LoadLibs.getTessAPIInstance(LoadLibs.java:85)
	at net.sourceforge.tess4j.TessAPI.<clinit>(TessAPI.java:42)
	at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:427)
	at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:223)
	at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:195)

Java Code

Tesseract instance = new Tesseract();
 instance.setDatapath(RunConfiguration.getProjectDir() + "/tessdata");
String imagePath = RunConfiguration.getProjectDir() + "/Captcha.png
 String captcha = instance.doOCR(new File(imagePath));

The same code is working fine on my Mac in Intellij and in windows OS. Issue occurs only in Katalon studio in Mac. Can anyone please help me to solve this.

1 Like

Hi @hemalatha.mani

Thanks for using the Katalon Studio. Hope you are doing great.

Could you please send me you application in Test so that I can give it a try.

You will have to add “Testeract” into Katalon Library management first before using
it.

2 Likes

I found a thread titled " Unable to load library ‘tesseract’: Native library (darwin/libtesseract.dylib)"

Have you installed Tesseract on Mac?

Hi @mohit.kumar ,

Thanks for your response.

Yeah, I have added all the required JAR files of Tesseract in Katalon Library management. But still facing the same error.

I have added the Test project for your reference. Tesseract Jar files are added separately. Kindly import these Jar files into Katalon.

DemoProject.zip (22.4 MB)

jar_files (1).zip (17.3 MB)

1 Like

Hi @kazurayam ,

Yes, I have installed Tesseract on my Mac.

I have referred java - Tess4j unsatisfied link error on mac OS X - Stack Overflow and I have added the darwin/libtesseract.dylib file into the Maven repo. After adding this, Tesseract OCR is working fine in Intellij. But still I am facing the UnsatisfiedLinkError error in Katalon.

I do not understand what you mean “into the Maven repo”. Do you mean “into under the $HOME/.m2 folder”?

I guess you project in IntelliJ is configured to refer to the Maven local repository, so your code works.

However, Katalon Studio does not understand Maven at all.
Katalon assumes that every external resourses to be located under the Drivers folder manually by you.
Perhaps it is the reason why your code does not work in Katalon Studio.

Thanks for your information @kazurayam . I meant the Maven local repository.

I added the libtesseract.dylib file under the Katalon project Drivers folder manually. Now I am facing java.lang.UnsatisfiedLinkError: Unable to load library 'tesseract': Can't obtain InputStream for darwin/libtesseract.dylib error .

You should raise an official support request to the Katalon developer team.

If you are a paying customer, you can raise a support request at

https://katalonsupport.force.com/katalonhelpcenter/s/article/How-to-submit-a-Support-Case

If you are not a paying customer, what to do? ---- ask @vu.tran to address this. He might do something for you.

3 Likes

Thank you @kazurayam for letting me know. I already asked our product team to help. @mohit.kumar will investigate more details that @hemalatha.mani provided and we will assist shortly. Thanks for your continued support.

1 Like

Hi @vu.tran ,

Please let me know if there is any possible fix for this issue.

Hi @hemalatha.mani for your follow-up. @mohit.kumar has tried to reproduce the issue but no luck so far. We have escalated this to our Product team for more information. We’ll update you once we have some progress. Best.

1 Like

Finally, I was able to successfully read the content from your PDF file(Tested on Both MAC and Window). Please follow the steps below to achieve the same:

UseCase:
If the user is unable to extract content using the PDF plugin, an alternative approach is to convert the PDF into images, apply Optical Character Recognition (OCR) to extract text, save the extracted content into a text file, and then read and verify the expected values. Below is the implementation in code.

Step-by-Step Guide to Implement OCR for PDF Content Extraction

  1. Convert PDF to Images: Render each page of the PDF as an image.
  2. Apply OCR: Use OCR to extract text from these images.
  3. Save Extracted Text: Save the extracted text into a .txt file.
  4. Read and Verify: Read the text file and verify the expected values.

Implementation in Code

  1. Download and Set Up Tesseract: Download and install Tesseract from here.
  2. Install PDF Utility: Install the PDF utility from here.
  3. Build a Test Case: Use the code provided below to build your test case.

import com.kms.katalon.core.util.ConsoleCommandBuilder

def folderPath = new File(‘C:\Users\Mohit\Katalon Studio\PDF Test\Data Files\temp’)
if (folderPath.exists()) {
folderPath.deleteDir()
println(‘Folder deleted successfully’)
} else {
println(‘Folder does not exist’)
}

CustomKeywords.‘com.kms.katalon.keyword.pdf.PDF.saveAllPagesAsImages’(‘C:\Users\Mohit\Katalon Studio\PDF Test\Data Files\pdf_validaton.pdf’)

ConsoleCommandBuilder.create(‘tesseract “C:\Users\Mohit\Katalon Studio\PDF Test\Data Files\temp\pdf_validaton_2.png” “C:\Users\Mohit\Katalon Studio\PDF Test\Data Files\temp\output”’).workingDir(
‘C:\Program Files (x86)\Tesseract-OCR’).redirectError().execSync()

def filePath = new File(‘C:\Users\Mohit\Katalon Studio\PDF Test\Data Files\temp\output.txt’)
String content
if (filePath.exists()) {
content = filePath.text
println(“File content:$content”)
} else {
println(‘File does not exist’)
}

assert content.contains(“FY3319”)
assert content.contains(“778P”)
assert content.toLowerCase().contains(“mohit”)

2 Likes

Hi there,

Thank you very much for your topic. Please note that it may take a little while before a member of our community or from Katalon team responds to you.

Thanks!