Capture screenshot and read the text in it for comparision

Hi Team,

Can anyone help me on this verification.

I want to take the screenshot of PDF and then convert into text file to compare the data.

How we can achieve this in Katalon Studio

1 Like

First thing you have to ask yourself is ‘can this be done programatically?’
once you have the answer, you can ask how this can be done with Katalon.

1 Like

You want to recongnize a text out of an image.

See the following thread which explains how difficult it is.

You should abandon your idea.

1 Like

You may try transforming a PDF file into a text using PDF Box library.

Using PDF Box, you will get a text in HTML syntax, but the content will be horribly complicated for human and your testing programs to read and recongize. You would find unable to consume it programmatically “to compare the data”.

PDF is a file format for human to read; it is not designed to be consumed by another program.

Again, I would say, you should give up your idea of testing the content of a PDF file. Don’t try to do it. You will never be successful.

See also

2 Likes

Thanks for your suggestion @kazurayam. I generally follow your solutions and want to appreciate your help for resolving issues really quick.

However coming to my problem, I tried reading data directly from PDF using Java function and it worked for couple of PDF’s.

But the one I am trying now is Secured PDF and when I pass the URL to the below function, it is not able to identify the text and giving below error as shown in screenshot.

def ReadPDFFile2(String PDFURL) {
URL TestURL = new URL(PDFURL);
BufferedInputStream TestFile = new BufferedInputStream(TestURL.openStream());
int i;
while((i=TestFile.read())!=-1){
System.out.print((char)i);
}
TestFile.close();
return “TestText”
}

So it is not possible for me to read data directly from PDF but need to check for alternative way.

1 Like

instead of reading the data from pdf, try comparing it with API which delivers the content to PDF

1 Like

I am surprised reading this. What do you, @prabhag, mean by saying “it worked”?

A PDF file is not a text; most probablly it contains a lot of bytes which you can not cast to characters safely. But you wrote you were successful.

Could you show an example successful output to System.out to us? I have no idea what it would look like.

1 Like

Here is the Keyword to read PDF
@Keyword
def ReadPDF(String PDFURL) {
URL TestURL = new URL(PDFURL)
BufferedInputStream bis = new BufferedInputStream(TestURL.openStream())
PDDocument doc = PDDocument.load(bis)
String pdfText = new PDFTextStripper().getText(doc)
doc.close()
bis.close()
return pdfText
}

From this, I have my own keyword to verify the text as per requirement

1 Like

So, you already have your solution (which is not ‘take screnshot’ and is a better approach)
Now, what do you expect from this comunity?
To decript a secured PDF?
The word ‘Secure’ says it all :slight_smile:
To achieve this, you may have to download the pdf, manipulate-it to remove the protection (google about how this can be achieved) and further process it.

2 Likes

import java.io.File;

import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;

import org.apache.pdfbox.pdmodel.encryption.StandardDecryptionMaterial;

public class Decrypt_pdf {

public static void main(String[] args)

throws IOException

{

// select a file for Decryption operation

File file = new File( "D:\\Bluetooth\\Encrypted.pdf" );

// Load the PDF file

PDDocument pdd = PDDocument.load(file, "12345" );

// removing all security from PDF file

pdd.setAllSecurityToBeRemoved( true );

// Save the PDF file

pdd.save(file);

// Close the PDF file

pdd.close();

System.out.println( "Decryption Done..." );

}

}

Try this to decrypt the secured pdf file

and if you have further questions follow this