Cannot verify "PDF" spawned web pages

Hi All,

“Spy Web” and / or “Record Web” is not working on “PDF” spawned web pages (Affects: Chrome/Firefox/IE).

Q1: Is there a way to use “Spy Web” or “Record Web” on “PDF” spawned web pages that I am not seeing?

I also checked the Katalon documentation for a verify.PDF keyword but did not find one. Maybe one can be added?

Q2: Would anyone have an example of custom verify.PDF keyword that could be used for “PDF” verification?

Actually when you open a .PDF page, e.g: http://192.168.1.2/yourFile.pdf, the browser is served as a PDF viewer instead of an actual webpage. Therefore you can’t “spy” or “record” anything on this PDF file due to this fact.

For ‘Verify PDF’ keyword, please refer to this guide to build your custom keyword : http://www.testingdiaries.com/selenium-webdriver-read-pdf-content/

Hi Vinh,

Thank-you for your quick response & link to “PDFBox”.

I have a few installation & usage questions for using the API in Katalon.

Would it be okay if I sent my questions to you directly via email?

Once I have figured out how to get “PDFBox” working I will update this ticket so others can use the steps if needed.

You can send your email to my account email if that works too.

Hi Vinh,
Please disregard my request for your email (I am guessing you’re pretty busy). I have created a text file with my questions. When time allows please review my questions.

Cheers,
Dave

ReadPDF.txt

Hi Dave,

Please find below my answers:
Q1: You need to import that library into your project: https://docs.katalon.com/display/KD/External+Libraries

Q2: Please see above solution as this is the action you need for this question

Q3: It is not incorrect somehow. You need to add @Keyword annotation to it, so it will be:

@Keyword
def ReadPDF(String PDFURL)
{
	URL TestURL = new URL(PDFURL);
	BufferedInputStream TestFile = new BufferedInputStream(TestURL.openStream());
	PDFParser TestPDF = new PDFParser(TestFile); //getting "Groovy:unable to resolve class PDFParser" error displays (might be due to not knowing where to reference JAR files in Katalon?)
	TestPDF.parse();
	String TestText = new PDFTextStripper().getText(TestPDF.getPDDocument()); //getting "Groovy:unable to resolve class PDFParser error" (same as above)
	Assert.assertTrue(TestText.contains("Open the setting.xml, you can see it is like this"));
}

Q4: After the above custom keyword, just call it and pass in your PDF URL as an input of this keyword :slight_smile:

Regards

Hi Vinh,

Thanks for your help :wink:

I am now seeing the following error when trying to use the custom readPDF () keyword:
*See attached files

Test Cases/Check PDFs FAILED because (of) groovy.lang.GroovyRuntimeException:
Could not find matching constructor for: org.apache.pdfbox.pdfparser.PDFParser(java.io.BufferedInputStream)
tools.readPDFs.ReadPDF:30
tools.readPDFs.invokeMethod:0
Test Cases/Check PDFs.run:31

ReadPDF02.txt

Read_PDF.png

Read_PDF_Test_Case.png

You don’t need to declare ‘public class’ in your custom keyword, just remove it and use method declaration instead.

Vinh Nguyen said:

You don’t need to declare ‘public class’ in your custom keyword, just remove it and use method declaration instead.

Katalon Studio creates that class exactly as the OP shows it. Are you saying we should remove the class declaration completely?

Hi Vinh,

First let me say thanks so much for your help; I’ve learned a lot so far…

I removed the ‘public class’ as you recommended but I am still seeing the “matching constructor” error (not sure what I am doing wrong (have you been able to get this to work)):

I created a brand new simple project & did as you recommended:

  1. Added the custom @Keyword, (see attached keyword screen shot)

  2. Added test case that calls the custom keyword (see attached testCase screen shot)

Result: Test Cases/Check PDFs FAILED because (of) groovy.lang.GroovyRuntimeException: Could not find matching constructor for: org.apache.pdfbox.pdfparser.PDFParser(java.io.BufferedInputStream)

tools.pdfReader.ReadPDF:27

Test Cases/Check PDFs.run:35

Keyword.png

testCase.png

1 Like

Hi Vinh,
I did some more research and was able to get the pdfbox to work.
When I print the result to the console I am NOW seeing the PDF contents.

Keyword used:

package tools
import org.apache.pdfbox.pdfparser.PDFParser
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.text.PDFTextStripper
import com.kms.katalon.core.annotation.Keyword
@Keyword
def ReadPDF(String PDFURL)
{
URL TestURL = new URL(PDFURL);
BufferedInputStream bis = new BufferedInputStream(TestURL.openStream());
PDDocument doc = PDDocument.load(bis);
String pdfText = new PDFTextStripper().getText(doc);
doc.close();
bis.close();
println(pdfText);
}

However I am not having any luck “Verifying Text Present” using “Assert.assertTrue(pdfText.contains(‘Open the setting.xml, you can see it is like this’))”

How should I validate the text “Open the setting.xml, you can see it is like this” exists in the read in PDF?

import static com.kms.katalon.core.checkpoint.CheckpointFactory.findCheckpoint
import static com.kms.katalon.core.testcase.TestCaseFactory.findTestCase
import static com.kms.katalon.core.testdata.TestDataFactory.findTestData
import static com.kms.katalon.core.testobject.ObjectRepository.findTestObject
import com.kms.katalon.core.checkpoint.Checkpoint as Checkpoint
import com.kms.katalon.core.checkpoint.CheckpointFactory as CheckpointFactory
import com.kms.katalon.core.mobile.keyword.MobileBuiltInKeywords as MobileBuiltInKeywords
import com.kms.katalon.core.mobile.keyword.MobileBuiltInKeywords as Mobile
import com.kms.katalon.core.model.FailureHandling as FailureHandling
import com.kms.katalon.core.testcase.TestCase as TestCase
import com.kms.katalon.core.testcase.TestCaseFactory as TestCaseFactory
import com.kms.katalon.core.testdata.TestData as TestData
import com.kms.katalon.core.testdata.TestDataFactory as TestDataFactory
import com.kms.katalon.core.testobject.ObjectRepository as ObjectRepository
import com.kms.katalon.core.testobject.TestObject as TestObject
import com.kms.katalon.core.webservice.keyword.WSBuiltInKeywords as WSBuiltInKeywords
import com.kms.katalon.core.webservice.keyword.WSBuiltInKeywords as WS
import com.kms.katalon.core.webui.keyword.WebUiBuiltInKeywords as WebUiBuiltInKeywords
import com.kms.katalon.core.webui.keyword.WebUiBuiltInKeywords as WebUI
import internal.GlobalVariable as GlobalVariable
import org.openqa.selenium.Keys as Keys
import tools.pdfReader2 as pdfReader2
WebUI.openBrowser('')
WebUI.navigateToUrl('http://www.testingdiaries.com/selenium-webdriver-read-pdf-content/')
WebUI.waitForPageLoad(3)
WebUI.click(findTestObject('PDFchecks/Page_Selenium WebDriver Read PDF Co/a_this location'))
WebUI.waitForPageLoad(3)
CustomKeywords.'tools.pdfReader2.ReadPDF'('http://www.axmag.com/download/pdfurl-guide.pdf')
String Assert = ''
Assert.assertTrue(pdfText.contains('Open the setting.xml, you can see it is like this'))

TestCase.png

This now works with the following (although I am wondering how to validate text from the “PDF” in the actual test case)…

Thanks Vinh for pointing me in the right direction to solve this issue :wink:

package tools
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.text.PDFTextStripper
import org.testng.Assert
import com.kms.katalon.core.annotation.Keyword
//######################################
//ADDING the ReadPDF() CUSTOM KEYWORD TO ANY PROJECT
//First add the pdfbox JAR file to the test project
//1. Open Apache PDFBox | Download
//2. Download the latest pdfbox JAR file (e.g., pdfbox-2.0.8.jar)
//3. Start Katalon & select “Project > Settings > External Libraries”
//4. Click Add & add the pdfbox-2.0.8.jar libraries to the project & click Apply
//5. Add the following ReadPDF() method custom keyword
//6. After adding the custom keyword press CTRL + SHIFT + O to add the needed libraries
//######################################
//USAGE:
// Call the keyword & pass the PDF URL as the keyword’s input
// Use Assert.assertTrue() to verify the results on the PDF
//######################################

@Keyword
def ReadPDF(String PDFURL)
{
URL TestURL = new URL(PDFURL);
BufferedInputStream bis = new BufferedInputStream(TestURL.openStream());
PDDocument doc = PDDocument.load(bis);
String pdfText = new PDFTextStripper().getText(doc);
doc.close();
bis.close();
println(pdfText);
Assert.assertTrue(pdfText.contains(“Open the setting.xml, you can see it is like this:”));
Assert.assertTrue(pdfText.contains(“Please add the following sentence in setting.xml before”));
Assert.assertTrue(pdfText.contains(“You can see that I have modified the setting.xml, and if open the file in IE, it is like this:”));
println “PDF IS GOOD TO GO…\r”;
}

TestCase:

WebUI.openBrowser(‘’)
WebUI.navigateToUrl(‘http://www.testingdiaries.com/selenium-webdriver-read-pdf-content/’)
WebUI.click(findTestObject(‘99_RPDF/Page_Selenium WebDriver Read PDF Co (1)/a_this location’))
CustomKeywords.‘tools.pdfReader.ReadPDF’(‘http://www.axmag.com/download/pdfurl-guide.pdf’)
WebUiBuiltInKeywords.closeBrowser()

Hi friend,
I am getting the below error while validating the pdf file. Could you please help me out how to resolve this.
Test Cases/Working_Folder/pdfReader FAILED because (of) org.codehaus.groovy.runtime.InvokerInvocationException: java.lang.NoClassDefFoundError: org/apache/fontbox/FontBoxFont
Thanks,
Abhishek

abhishek kumar said:

Hi friend,
I am getting the below error while validating the pdf file. Could you please help me out how to resolve this.
Test Cases/Working_Folder/pdfReader FAILED because (of) org.codehaus.groovy.runtime.InvokerInvocationException: java.lang.NoClassDefFoundError: org/apache/fontbox/FontBoxFont
Thanks,
Abhishek

Hi Abhishek,
Did you try replicating my example exactly as above (see, 03/09/2018). This will open http://www.testingdiaries.com/selenium-webdriver-read-pdf-content/ & read the PDF file… Your results will display in the Katalon Console.

Hi Abhishek,
You can get a copy of my example here…

1 Open https://github.com/Tigger99/ReadPdf-Documents

2 Click on “ReadPDF documents.rar”

3 Click “Download” & save to the local system

4 Right click on the file & select “Extract Here”

5 Open the “ReadPDF documents” folder from Katalon

6 Once loaded select Test Cases > & open Check PDF

7 Run the test & then click on the Console view

8 Open Keywords > tools > pdfReader.groovy to view what is being validated

1 Like

Dave Evers said:

This now works with the following (although I am wondering how to validate text from the “PDF” in the actual test case)…

Thanks Vinh for pointing me in the right direction to solve this issue :wink:

package tools
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.text.PDFTextStripper
import org.testng.Assert
import com.kms.katalon.core.annotation.Keyword
//######################################
//ADDING the ReadPDF() CUSTOM KEYWORD TO ANY PROJECT
//First add the pdfbox JAR file to the test project
//1. Open Apache PDFBox | Download
//2. Download the latest pdfbox JAR file (e.g., pdfbox-2.0.8.jar)
//3. Start Katalon & select “Project > Settings > External Libraries”
//4. Click Add & add the pdfbox-2.0.8.jar libraries to the project & click Apply
//5. Add the following ReadPDF() method custom keyword
//6. After adding the custom keyword press CTRL + SHIFT + O to add the needed libraries
//######################################
//USAGE:
// Call the keyword & pass the PDF URL as the keyword’s input
// Use Assert.assertTrue() to verify the results on the PDF
//######################################

@Keyword
def ReadPDF(String PDFURL)
{
URL TestURL = new URL(PDFURL);
BufferedInputStream bis = new BufferedInputStream(TestURL.openStream());
PDDocument doc = PDDocument.load(bis);
String pdfText = new PDFTextStripper().getText(doc);
doc.close();
bis.close();
println(pdfText);
Assert.assertTrue(pdfText.contains(“Open the setting.xml, you can see it is like this:”));
Assert.assertTrue(pdfText.contains(“Please add the following sentence in setting.xml before”));
Assert.assertTrue(pdfText.contains(“You can see that I have modified the setting.xml, and if open the file in IE, it is like this:”));
println “PDF IS GOOD TO GO…\r”;
}

TestCase:

WebUI.openBrowser(‘’)
WebUI.navigateToUrl(‘http://www.testingdiaries.com/selenium-webdriver-read-pdf-content/’)
WebUI.click(findTestObject(‘99_RPDF/Page_Selenium WebDriver Read PDF Co (1)/a_this location’))
CustomKeywords.‘tools.pdfReader.ReadPDF’(‘http://www.axmag.com/download/pdfurl-guide.pdf’)
WebUiBuiltInKeywords.closeBrowser()

Hi

first of all thank you very much

When I do what you indicate I get the error java.security.cert.certificateexception no subject alternative dns name matching

(In the project I have enabled bypass certificate validation)

1 Like

Hi Carlos,
Sorry for the last response… I am not sure why you are seeing that error.
When you use the sample project does that work?
The DNS error may be due to some setup on your side.
Any other Katalon folks have any ideas?

Hi Dave, it is good solution, but my pdf with base auitorization. I resive 401 status. Where i can send user/password ?

For Base Autorization add Autentificator in

@Keyword

def ReadPDF(String PDFURL)

{

Authenticator.setDefault (new Authenticator() {

protected PasswordAuthentication getPasswordAuthentication() {

return new PasswordAuthentication (“user”, “password”.toCharArray());

}

});

URL TestURL = new URL(PDFURL);

1 Like

How do we read if the PDF is downloaded into a local directory?
I tried ‘C:\\folder\sample.pdf’ (and f’ile:///C:/Users/sample.pdf’) and error appears:

java.net.MalformedURLException: unknown protocol