Hi Team,
Can anyone help me on this verification.
I want to take the screenshot of PDF and then convert into text file to compare the data.
How we can achieve this in Katalon Studio
Hi Team,
Can anyone help me on this verification.
I want to take the screenshot of PDF and then convert into text file to compare the data.
How we can achieve this in Katalon Studio
First thing you have to ask yourself is âcan this be done programatically?â
once you have the answer, you can ask how this can be done with Katalon.
You want to recongnize a text out of an image.
See the following thread which explains how difficult it is.
You should abandon your idea.
You may try transforming a PDF file into a text using PDF Box library.
Using PDF Box, you will get a text in HTML syntax, but the content will be horribly complicated for human and your testing programs to read and recongize. You would find unable to consume it programmatically âto compare the dataâ.
PDF is a file format for human to read; it is not designed to be consumed by another program.
Again, I would say, you should give up your idea of testing the content of a PDF file. Donât try to do it. You will never be successful.
See also
Thanks for your suggestion @kazurayam. I generally follow your solutions and want to appreciate your help for resolving issues really quick.
However coming to my problem, I tried reading data directly from PDF using Java function and it worked for couple of PDFâs.
But the one I am trying now is Secured PDF and when I pass the URL to the below function, it is not able to identify the text and giving below error as shown in screenshot.
def ReadPDFFile2(String PDFURL) {
URL TestURL = new URL(PDFURL);
BufferedInputStream TestFile = new BufferedInputStream(TestURL.openStream());
int i;
while((i=TestFile.read())!=-1){
System.out.print((char)i);
}
TestFile.close();
return âTestTextâ
}
So it is not possible for me to read data directly from PDF but need to check for alternative way.
instead of reading the data from pdf, try comparing it with API which delivers the content to PDF
I am surprised reading this. What do you, @prabhag, mean by saying âit workedâ?
A PDF file is not a text; most probablly it contains a lot of bytes which you can not cast to characters safely. But you wrote you were successful.
Could you show an example successful output to System.out to us? I have no idea what it would look like.
Here is the Keyword to read PDF
@Keyword
def ReadPDF(String PDFURL) {
URL TestURL = new URL(PDFURL)
BufferedInputStream bis = new BufferedInputStream(TestURL.openStream())
PDDocument doc = PDDocument.load(bis)
String pdfText = new PDFTextStripper().getText(doc)
doc.close()
bis.close()
return pdfText
}
From this, I have my own keyword to verify the text as per requirement
So, you already have your solution (which is not âtake screnshotâ and is a better approach)
Now, what do you expect from this comunity?
To decript a secured PDF?
The word âSecureâ says it all
To achieve this, you may have to download the pdf, manipulate-it to remove the protection (google about how this can be achieved) and further process it.
import
java.io.File;
import
java.io.IOException;
import
org.apache.pdfbox.pdmodel.PDDocument;
import
org.apache.pdfbox.pdmodel.encryption.StandardDecryptionMaterial;
public
class
Decrypt_pdf {
public
static
void
main(String[] args)
throws
IOException
{
// select a file for Decryption operation
File file =
new
File(
"D:\\Bluetooth\\Encrypted.pdf"
);
// Load the PDF file
PDDocument pdd = PDDocument.load(file,
"12345"
);
// removing all security from PDF file
pdd.setAllSecurityToBeRemoved(
true
);
// Save the PDF file
pdd.save(file);
// Close the PDF file
pdd.close();
System.out.println(
"Decryption Done..."
);
}
}
Try this to decrypt the secured pdf file
and if you have further questions follow this