Capture screenshot and read the text in it for comparision

prabhag · August 2, 2023, 10:23am

Hi Team,

Can anyone help me on this verification.

I want to take the screenshot of PDF and then convert into text file to compare the data.

How we can achieve this in Katalon Studio

anon46315158 · August 2, 2023, 3:08pm

First thing you have to ask yourself is ‘can this be done programatically?’
once you have the answer, you can ask how this can be done with Katalon.

kazurayam · August 2, 2023, 8:57pm

You want to recongnize a text out of an image.

See the following thread which explains how difficult it is.

You should abandon your idea.

kazurayam · August 2, 2023, 9:08pm

You may try transforming a PDF file into a text using PDF Box library.

Using PDF Box, you will get a text in HTML syntax, but the content will be horribly complicated for human and your testing programs to read and recongize. You would find unable to consume it programmatically “to compare the data”.

PDF is a file format for human to read; it is not designed to be consumed by another program.

Again, I would say, you should give up your idea of testing the content of a PDF file. Don’t try to do it. You will never be successful.

See also

prabhag · August 3, 2023, 4:37am

Thanks for your suggestion @kazurayam. I generally follow your solutions and want to appreciate your help for resolving issues really quick.

However coming to my problem, I tried reading data directly from PDF using Java function and it worked for couple of PDF’s.

But the one I am trying now is Secured PDF and when I pass the URL to the below function, it is not able to identify the text and giving below error as shown in screenshot.

def ReadPDFFile2(String PDFURL) {
URL TestURL = new URL(PDFURL);
BufferedInputStream TestFile = new BufferedInputStream(TestURL.openStream());
int i;
while((i=TestFile.read())!=-1){
System.out.print((char)i);
}
TestFile.close();
return “TestText”
}

So it is not possible for me to read data directly from PDF but need to check for alternative way.

dineshh · August 3, 2023, 6:02am

instead of reading the data from pdf, try comparing it with API which delivers the content to PDF

kazurayam · August 3, 2023, 7:23am

I am surprised reading this. What do you, @prabhag, mean by saying “it worked”?

A PDF file is not a text; most probablly it contains a lot of bytes which you can not cast to characters safely. But you wrote you were successful.

Could you show an example successful output to System.out to us? I have no idea what it would look like.

prabhag · August 3, 2023, 7:53am

Here is the Keyword to read PDF
@Keyword
def ReadPDF(String PDFURL) {
URL TestURL = new URL(PDFURL)
BufferedInputStream bis = new BufferedInputStream(TestURL.openStream())
PDDocument doc = PDDocument.load(bis)
String pdfText = new PDFTextStripper().getText(doc)
doc.close()
bis.close()
return pdfText
}

From this, I have my own keyword to verify the text as per requirement

anon46315158 · August 3, 2023, 10:45am

So, you already have your solution (which is not ‘take screnshot’ and is a better approach)
Now, what do you expect from this comunity?
To decript a secured PDF?
The word ‘Secure’ says it all
To achieve this, you may have to download the pdf, manipulate-it to remove the protection (google about how this can be achieved) and further process it.

bharathi.a · August 8, 2023, 10:53am

import java.io.File;

import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;

import org.apache.pdfbox.pdmodel.encryption.StandardDecryptionMaterial;

public class Decrypt_pdf {

public static void main(String[] args)

throws IOException

{

// select a file for Decryption operation

File file = new File( "D:\\Bluetooth\\Encrypted.pdf" );

// Load the PDF file

PDDocument pdd = PDDocument.load(file, "12345" );

// removing all security from PDF file

pdd.setAllSecurityToBeRemoved( true );

// Save the PDF file

pdd.save(file);

// Close the PDF file

pdd.close();

System.out.println( "Decryption Done..." );

}

Try this to decrypt the secured pdf file

and if you have further questions follow this

Topic		Replies	Views
It Possible to take screen shot and than read the text present in that image and compere that text Katalon Studio katalon-studio , web-testing	12	2714	October 19, 2022
Taking screenshots from PDF file with Apache PDFBox Katalon Studio katalon-studio , web-testing	25	4567	January 7, 2020
Handle in-browser opened PDF files Katalon Studio katalon-studio , web-testing	5	2090	January 6, 2020
Get values of the PDF Katalon Studio katalon-studio , web-testing	3	851	September 27, 2023
How to verify the text in PDF Katalon Studio katalon-studio , web-testing	9	5940	August 8, 2022

Capture screenshot and read the text in it for comparision

Related topics