Handle in-browser opened PDF files

There is a PDF file, which is opened in browser, but I am unable to save or print it. How could I scroll it or take screenshots from it?

Hi,

don’t get your point why will take screenshots from .pdf file
is this what you will need, read the pdf file context?

TestCase
import static com.kms.katalon.core.checkpoint.CheckpointFactory.findCheckpoint
import static com.kms.katalon.core.testcase.TestCaseFactory.findTestCase
import static com.kms.katalon.core.testdata.TestDataFactory.findTestData
import static com.kms.katalon.core.testobject.ObjectRepository.findTestObject
import com.kms.katalon.core.checkpoint.Checkpoint as Checkpoint
import com.kms.katalon.core.cucumber.keyword.CucumberBuiltinKeywords as CucumberKW
import com.kms.katalon.core.mobile.keyword.MobileBuiltInKeywords as Mobile
import com.kms.katalon.core.model.FailureHandling as FailureHandling
import com.kms.katalon.core.testcase.TestCase as TestCase
import com.kms.katalon.core.testdata.TestData as TestData
import com.kms.katalon.core.testobject.TestObject as TestObject
import com.kms.katalon.core.webservice.keyword.WSBuiltInKeywords as WS
import com.kms.katalon.core.webui.keyword.WebUiBuiltInKeywords as WebUI
import internal.GlobalVariable as GlobalVariable

import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

WebDriver driver = new ChromeDriver();
driver.get("http://www.vandevenbv.nl/dynamics/modules/SFIL0200/view.php?fil_Id=5515")
String url = driver.getCurrentUrl()

def pdf = CustomKeywords.'com.pdf.reader.ReadPdfFromBrowser.PdfReaderUtil'(url, driver)

def lines = pdf.split("\\r?\\n");
for (String line : lines) {
	System.out.println(line);
}

Keyword
package com.pdf.reader

import static com.kms.katalon.core.checkpoint.CheckpointFactory.findCheckpoint
import static com.kms.katalon.core.testcase.TestCaseFactory.findTestCase
import static com.kms.katalon.core.testdata.TestDataFactory.findTestData
import static com.kms.katalon.core.testobject.ObjectRepository.findTestObject

import com.kms.katalon.core.annotation.Keyword
import com.kms.katalon.core.checkpoint.Checkpoint
import com.kms.katalon.core.cucumber.keyword.CucumberBuiltinKeywords as CucumberKW
import com.kms.katalon.core.mobile.keyword.MobileBuiltInKeywords as Mobile
import com.kms.katalon.core.model.FailureHandling
import com.kms.katalon.core.testcase.TestCase
import com.kms.katalon.core.testdata.TestData
import com.kms.katalon.core.testobject.TestObject
import com.kms.katalon.core.webservice.keyword.WSBuiltInKeywords as WS
import com.kms.katalon.core.webui.keyword.WebUiBuiltInKeywords as WebUI

import internal.GlobalVariable

import java.io.BufferedInputStream;
import java.io.File;
import java.io.RandomAccessFile;
import java.net.URL;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.PDFTextStripperByArea;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

public class ReadPdfFromBrowser {

	PDDocument pdDoc;

	@Keyword
	public String PdfReaderUtil(String html, WebDriver driver){

		String pdfFileInText = "";

		Thread.sleep(5000);
		URL url = new URL(html);
		BufferedInputStream fileToParse = new BufferedInputStream(
				url.openStream());

		pdDoc = PDDocument.load(fileToParse);
		pdDoc.getClass();

		if (!pdDoc.isEncrypted()) {

			PDFTextStripperByArea stripper = new PDFTextStripperByArea();
			stripper.setSortByPosition(true);

			PDFTextStripper tStripper = new PDFTextStripper();

			pdfFileInText = tStripper.getText(pdDoc);

		}
		driver.close();
		return pdfFileInText;
	}
}
2 Likes

Thanks a lot for your answer, that is what I am looking for!

It is possible to take also screenshots from pages? It would be intput for a text recognizer service.

First of all I get error message for some files:

Test Cases/print FAILED.
Reason:
org.codehaus.groovy.runtime.InvokerInvocationException: java.io.IOException: Error: End-of-File, expected line
at com.pdf.reader.ReadPdfFromBrowser.invokeMethod(ReadPdfFromBrowser.groovy)
at com.kms.katalon.core.main.CustomKeywordDelegatingMetaClass.invokeStaticMethod(CustomKeywordDelegatingMetaClass.java:50)
at print.run(print:10)
at com.kms.katalon.core.main.ScriptEngine.run(ScriptEngine.java:194)
at com.kms.katalon.core.main.ScriptEngine.runScriptAsRawText(ScriptEngine.java:119)
at com.kms.katalon.core.main.TestCaseExecutor.runScript(TestCaseExecutor.java:337)
at com.kms.katalon.core.main.TestCaseExecutor.doExecute(TestCaseExecutor.java:328)
at com.kms.katalon.core.main.TestCaseExecutor.processExecutionPhase(TestCaseExecutor.java:307)
at com.kms.katalon.core.main.TestCaseExecutor.accessMainPhase(TestCaseExecutor.java:299)
at com.kms.katalon.core.main.TestCaseExecutor.execute(TestCaseExecutor.java:233)
at com.kms.katalon.core.main.TestCaseMain.runTestCase(TestCaseMain.java:114)
at com.kms.katalon.core.main.TestCaseMain.runTestCase(TestCaseMain.java:105)
at com.kms.katalon.core.main.TestCaseMain$runTestCase$0.call(Unknown Source)
at TempTestCase1578224744400.run(TempTestCase1578224744400.groovy:23)
Caused by: java.io.IOException: Error: End-of-File, expected line
at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1124)
at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2603)
at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2574)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1222)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1122)
at org.apache.pdfbox.pdmodel.PDDocument$load.call(Unknown Source)
at com.pdf.reader.ReadPdfFromBrowser.PdfReaderUtil(ReadPdfFromBrowser.groovy:46)
… 14 more

Sample pdf, as I tried to reproduce original pdf document: https://gofile.io/?c=WYPqpZ