Is it possible to read pdf file from url ex : "http://site.com/thispdf"

Didit_Setiawan · April 18, 2018, 6:39am

Just want to ask, is it possible to read pdf document but the link is
“http://site.com/thispdf”

instead of

this pdf is generated automatically so it’s not saved or stored on web storage.
If anyone have solution please answer my question. Thank You.

Marek_Melocik · April 18, 2018, 7:53am

Hi Didit,

I am afraid that you should need some 3rd party Java library to read PDF files.

You want to download it and parse/read afterwards? Or what do you want to do with the file?

Didit_Setiawan · April 18, 2018, 8:36am

Marek Melocik said:

Hi Didit,

I am afraid that you should need some 3rd party Java library to read PDF files.

You want to download it and parse/read afterwards? Or what do you want to do with the file?

already using java 3rd library, pdfbox
the problem is cannot open stream(BufferedInputStream) because server detect unauthorized access(it requires authentication header), (othercase)using httpurlconnection, we can set property like this httpConn.setRequestProperty(“Authorization”, basicAuth) to pass header authentication, is there any option for openstream like setrequestproperty on openconnection

mydung.nguyen9192 · March 20, 2019, 4:21pm

I have the same problem. Do you have any solusion?

Timo_Kuisma · March 20, 2019, 5:17pm

HI,

are you able to save it to your pc?
you can try to use browser properties for that, which browser used?

not sure, but something like that

https://www.toolsqa.com/selenium-webdriver/how-to-download-files-using-selenium/

Timo_Kuisma · March 22, 2019, 3:38pm

Hi,
disable chrome browser pdf viewer option from the chrome setting.

Testcase

//set chrome options
driver = setChromeOptions(folder)
DriverFactory.changeWebDriver(driver)

//verify pdf
driver.get('https://pressbooks.com/sample-books/')
driver.findElement(By.xpath("//article[@id='post-2344']/div/ul/li/a")).click()
def url = WebUI.getUrl()
println url
String pdfContent = CustomKeywords.'readPdfFile.verifyPdfContent.readPdfFileVerify'(url)
Assert.assertTrue(pdfContent.contains('The PressBooks version of The Metamorphosis, by Franz Kafka.'))

public WebDriver setChromeOptions(File folder){
	
	ChromeOptions optionsBeta = new ChromeOptions();
	String downloadPath = folder.getAbsolutePath()
	//String downloadsPath = System.getProperty("user.home") + "/Downloads";
	println ("downloadpath "+downloadPath)
	
	Map<String, Object> chromePrefs = new HashMap<String, Object>()
	chromePrefs.put("profile.default_content_settings.popups", 0);
	chromePrefs.put("download.default_directory", downloadPath)
	chromePrefs.put("download.prompt_for_download", false)
	chromePrefs.put("plugins.plugins_disabled", "Chrome PDF Viewer");
	
	optionsBeta.setExperimentalOption("prefs", chromePrefs)
	DesiredCapabilities cap = DesiredCapabilities.chrome()
	cap.setCapability(ChromeOptions.CAPABILITY, optionsBeta)
	
	System.setProperty("webdriver.chrome.driver", DriverFactory.getChromeDriverPath())
	WebDriver driver = new ChromeDriver(cap);
	return driver
}

Keyword

	@Keyword
	public String readPdfFileVerify(String pdfUrl){

		URL TestURL = new URL(pdfUrl);
		BufferedInputStream bis = new BufferedInputStream(TestURL.openStream());
		PDDocument doc = PDDocument.load(bis);
		String pdfText = new PDFTextStripper().getText(doc);
		doc.close();
		bis.close();
		println(pdfText);
		return pdfText;

	}

Timo_Kuisma · March 22, 2019, 3:43pm

Hi,

first disable pdf viewer option from chrome settings.

Testcase

File folder

folder = new File(UUID.randomUUID().toString())
folder.mkdir()

//set chrome options
driver = setChromeOptions(folder)
DriverFactory.changeWebDriver(driver)

//verify pdf
driver.get('https://pressbooks.com/sample-books/')
driver.findElement(By.xpath("//article[@id='post-2344']/div/ul/li/a")).click()
def url = WebUI.getUrl()
println url
String pdfContent = CustomKeywords.'readPdfFile.verifyPdfContent.readPdfFileVerify'(url)
Assert.assertTrue(pdfContent.contains('The PressBooks version of The Metamorphosis, by Franz Kafka.'))

public WebDriver setChromeOptions(File folder){
	
	ChromeOptions optionsBeta = new ChromeOptions();
	String downloadPath = folder.getAbsolutePath()
	println ("downloadpath "+downloadPath)
	
	Map<String, Object> chromePrefs = new HashMap<String, Object>()
	chromePrefs.put("profile.default_content_settings.popups", 0);
	chromePrefs.put("download.default_directory", downloadPath)
	chromePrefs.put("download.prompt_for_download", false)
	chromePrefs.put("plugins.plugins_disabled", "Chrome PDF Viewer");
	
	optionsBeta.setExperimentalOption("prefs", chromePrefs)
	DesiredCapabilities cap = DesiredCapabilities.chrome()
	cap.setCapability(ChromeOptions.CAPABILITY, optionsBeta)
	
	System.setProperty("webdriver.chrome.driver", DriverFactory.getChromeDriverPath())
	WebDriver driver = new ChromeDriver(cap);
	return driver
}

Keyword

	@Keyword
	public String readPdfFileVerify(String pdfUrl){

		URL TestURL = new URL(pdfUrl);
		BufferedInputStream bis = new BufferedInputStream(TestURL.openStream());
		PDDocument doc = PDDocument.load(bis);
		String pdfText = new PDFTextStripper().getText(doc);
		doc.close();
		bis.close();
		println(pdfText);
		return pdfText;

	}

Topic		Replies	Views
[KShare] How to read PDF files when working with "blob:https://" web pages Product Insights katalon-studio , support , gatedknowledge , jordan-bartley	1	181	March 12, 2024
Reading PDF in new Browser Tab and verify text,elements etc Katalon Studio katalon-studio , web-testing , automation	4	342	August 24, 2023
How to verify the text in PDF Katalon Studio katalon-studio , web-testing	9	5293	August 8, 2022
Read PDFs Archive katalon-recorder	0	752	November 19, 2018
Download files using desired capabilities from Firefox and chrome NOT WORKING in 7.0.5 version Katalon Studio katalon-studio , web-testing	2	759	January 10, 2024

Is it possible to read pdf file from url ex : "http://site.com/thispdf"

Related Topics