Tesseract OCR is getting failed in Mac with UnsatisfiedLinkError

hemalatha.mani · September 12, 2022, 12:50pm

I am trying to automate a Numeric Captcha using Tesseract OCR in Mac OS (Monterey). But I am facing below error.

Reason:
java.lang.UnsatisfiedLinkError: Unable to load library 'tesseract': Native library (darwin/libtesseract.dylib) not found in resource path ([file:/Users/xyz/Katalon%20Studio/TestProject/bin/keyword/, file:/Users/xyz/Katalon%20Studio/TestProject/Keywords/, file:/Users/xyz/Katalon%20Studio/TestProject/bin/listener/,....])
   at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:277)
	at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:403)
	at com.sun.jna.Library$Handler.<init>(Library.java:147)
	at com.sun.jna.Native.loadLibrary(Native.java:502)
	at com.sun.jna.Native.loadLibrary(Native.java:481)
	at net.sourceforge.tess4j.util.LoadLibs.getTessAPIInstance(LoadLibs.java:85)
	at net.sourceforge.tess4j.TessAPI.<clinit>(TessAPI.java:42)
	at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:427)
	at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:223)
	at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:195)

Java Code

Tesseract instance = new Tesseract();
 instance.setDatapath(RunConfiguration.getProjectDir() + "/tessdata");
String imagePath = RunConfiguration.getProjectDir() + "/Captcha.png
 String captcha = instance.doOCR(new File(imagePath));

The same code is working fine on my Mac in Intellij and in windows OS. Issue occurs only in Katalon studio in Mac. Can anyone please help me to solve this.

mohit.kumar · September 14, 2022, 9:02am

Hi @hemalatha.mani

Thanks for using the Katalon Studio. Hope you are doing great.

Could you please send me you application in Test so that I can give it a try.

You will have to add “Testeract” into Katalon Library management first before using
it.

kazurayam · September 14, 2022, 11:51am

I found a thread titled " Unable to load library ‘tesseract’: Native library (darwin/libtesseract.dylib)"

Have you installed Tesseract on Mac?

hemalatha.mani · September 14, 2022, 1:40pm

Hi @mohit.kumar ,

Thanks for your response.

Yeah, I have added all the required JAR files of Tesseract in Katalon Library management. But still facing the same error.

I have added the Test project for your reference. Tesseract Jar files are added separately. Kindly import these Jar files into Katalon.

DemoProject.zip (22.4 MB)

jar_files (1).zip (17.3 MB)

hemalatha.mani · September 14, 2022, 1:46pm

Hi @kazurayam ,

Yes, I have installed Tesseract on my Mac.

I have referred java - Tess4j unsatisfied link error on mac OS X - Stack Overflow and I have added the darwin/libtesseract.dylib file into the Maven repo. After adding this, Tesseract OCR is working fine in Intellij. But still I am facing the UnsatisfiedLinkError error in Katalon.

kazurayam · September 14, 2022, 2:35pm

I do not understand what you mean “into the Maven repo”. Do you mean “into under the $HOME/.m2 folder”?

I guess you project in IntelliJ is configured to refer to the Maven local repository, so your code works.

However, Katalon Studio does not understand Maven at all.
Katalon assumes that every external resourses to be located under the Drivers folder manually by you.
Perhaps it is the reason why your code does not work in Katalon Studio.

hemalatha.mani · September 14, 2022, 5:38pm

Thanks for your information @kazurayam . I meant the Maven local repository.

I added the libtesseract.dylib file under the Katalon project Drivers folder manually. Now I am facing java.lang.UnsatisfiedLinkError: Unable to load library 'tesseract': Can't obtain InputStream for darwin/libtesseract.dylib error .

kazurayam · September 14, 2022, 8:29pm

You should raise an official support request to the Katalon developer team.

If you are a paying customer, you can raise a support request at

https://katalonsupport.force.com/katalonhelpcenter/s/article/How-to-submit-a-Support-Case

If you are not a paying customer, what to do? ---- ask @vu.tran to address this. He might do something for you.

vu.tran · September 15, 2022, 4:01am

Thank you @kazurayam for letting me know. I already asked our product team to help. @mohit.kumar will investigate more details that @hemalatha.mani provided and we will assist shortly. Thanks for your continued support.

hemalatha.mani · September 25, 2022, 2:59pm

Hi @vu.tran ,

Please let me know if there is any possible fix for this issue.

vu.tran · September 26, 2022, 5:33am

Hi @hemalatha.mani for your follow-up. @mohit.kumar has tried to reproduce the issue but no luck so far. We have escalated this to our Product team for more information. We’ll update you once we have some progress. Best.

mohit.kumar · July 28, 2024, 4:56pm

Finally, I was able to successfully read the content from your PDF file(Tested on Both MAC and Window). Please follow the steps below to achieve the same:

UseCase:
If the user is unable to extract content using the PDF plugin, an alternative approach is to convert the PDF into images, apply Optical Character Recognition (OCR) to extract text, save the extracted content into a text file, and then read and verify the expected values. Below is the implementation in code.

Step-by-Step Guide to Implement OCR for PDF Content Extraction

Convert PDF to Images: Render each page of the PDF as an image.
Apply OCR: Use OCR to extract text from these images.
Save Extracted Text: Save the extracted text into a .txt file.
Read and Verify: Read the text file and verify the expected values.

Implementation in Code

Download and Set Up Tesseract: Download and install Tesseract from here.
Install PDF Utility: Install the PDF utility from here.
Build a Test Case: Use the code provided below to build your test case.

import com.kms.katalon.core.util.ConsoleCommandBuilder

def folderPath = new File(‘C:\Users\Mohit\Katalon Studio\PDF Test\Data Files\temp’)
if (folderPath.exists()) {
folderPath.deleteDir()
println(‘Folder deleted successfully’)
} else {
println(‘Folder does not exist’)
}

CustomKeywords.‘com.kms.katalon.keyword.pdf.PDF.saveAllPagesAsImages’(‘C:\Users\Mohit\Katalon Studio\PDF Test\Data Files\pdf_validaton.pdf’)

ConsoleCommandBuilder.create(‘tesseract “C:\Users\Mohit\Katalon Studio\PDF Test\Data Files\temp\pdf_validaton_2.png” “C:\Users\Mohit\Katalon Studio\PDF Test\Data Files\temp\output”’).workingDir(
‘C:\Program Files (x86)\Tesseract-OCR’).redirectError().execSync()

def filePath = new File(‘C:\Users\Mohit\Katalon Studio\PDF Test\Data Files\temp\output.txt’)
String content
if (filePath.exists()) {
content = filePath.text
println(“File content:$content”)
} else {
println(‘File does not exist’)
}

assert content.contains(“FY3319”)
assert content.contains(“778P”)
assert content.toLowerCase().contains(“mohit”)

system · July 28, 2024, 4:56pm

Hi there,

Thank you very much for your topic. Please note that it may take a little while before a member of our community or from Katalon team responds to you.

Thanks!

Topic		Replies	Views
KatalonStudio 7.4 tesseract (OCR) not work as it works in IntelliJ Web Testing katalon-studio	16	2576	July 16, 2021
How to Get OpenCV 4.0.1 Working With Katalon (Windows) On Native Library (i.e., DLL) Together JAR Integrations integrations , katalon-studio	3	1654	August 8, 2022
Katalon at Unix box throws FAILED because (of) java.lang.UnsatisfiedLinkError: /opt/software/jdk1.8 Miscellaneous katalon-studio	1	689	November 21, 2017
Test Cases/demo FAILED because (of) java.lang.UnsatisfiedLinkError: D:\Softwares\kATALON123\demo1234 Integrations integrations , katalon-studio	1	774	June 27, 2018
Test run is getting failed in Katalon testops Miscellaneous	4	45	October 9, 2024

Tesseract OCR is getting failed in Mac with UnsatisfiedLinkError

Step-by-Step Guide to Implement OCR for PDF Content Extraction

Related topics