Read PDF having URL starting with blob

I want to read the content from PDF but i am getting error as unknown protocol blob —
URL - blob:https://*URL.com/61416417-e048-43ce-9bf7-2f0d3c1adb49.pdf

My code-

String validatePDFContent(String PDFURL) throws IOException{
println “--------------------------------PDF reading started----------------------------”
URL TestURL = new URL(PDFURL);
BufferedInputStream bis = new BufferedInputStream(TestURL.openConnection());
PDDocument doc = PDDocument.load(bis);
PDFTextStripper pdfStripper = new PDFTextStripper();
String pdfText = pdfStripper.getText(doc);
println(pdfText);
Assert.assertTrue(pdfText.contains(“Member ID”))
doc.close();
bis.close();
println(“Done”)
}

1 Like

hi,

something wrong with your url
check it before using it

See URL in below screenshot

hi,

I have no idea about url like you have to use
blob:https://* **URL** .com/61416417-e048-43ce-9bf7-2f0d3c1adb49.pdf

what is blob:  ?

can you use without blob:?

here is decribed what blob is

what you can try is first download pdf file and then read it from the download folder

hi,

if you are able to download file then you can read it like this way

import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.text.PDFTextStripper
import org.apache.pdfbox.text.PDFTextStripperByArea;

	
	PDDocument document = PDDocument.load(new File("C:\\Users\\xxx\\Desktop\\BLOB\\49f93860-4d86-49c3-b6ed-003b1d41dda9.pdf"))
	document.getClass();
	if (!document.isEncrypted()) {
		PDFTextStripperByArea stripper = new PDFTextStripperByArea();
		stripper.setSortByPosition(true);
		PDFTextStripper tStripper = new PDFTextStripper();
		String pdfFileInText = tStripper.getText(document);
		// split by whitespace
		List lines = pdfFileInText.split("\\r?\\n");
		for (String line : lines) {
			System.out.println(line);
		}
	}

@Timo_Kuisma

Our requirement is read PDF directly in browser and not to download and read.

Please go through below link to understand what is blob URL- https://stackoverflow.com/questions/30864573/what-is-a-blob-url-and-why-it-is-used.

And let me know if you have any solution to read this BLOB URL PDF

hello boy,

My suggestion was only to read .pdf file from the disk, not the solution!!!
Try using this lib
Apache PDFBox

hi,
yes with apache pdfBox is able to read direct from browser

maybe this is something would help you
but do not anymore say that i am not understand what Blob is and how the Blob is implemented :slight_smile:

TESTCASE

def pdf = CustomKeywords.'com.pdf.reader.ReadPdfFromBrowser.PdfReaderUtil'()

def lines = pdf.split("\\r?\\n");
for (String line : lines) {
	System.out.println(line);
}

KEYWORD

import java.io.BufferedInputStream;
import java.io.File;
import java.io.RandomAccessFile;
import java.net.URL;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.PDFTextStripperByArea;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

public class ReadPdfFromBrowser {

	WebDriver driver;
	PDDocument pdDoc;

	@Keyword
	public String PdfReaderUtil(){
		
		String pdfFileInText = "";

		driver=new ChromeDriver();

		driver.get("http://www.vandevenbv.nl/dynamics/modules/SFIL0200/view.php?fil_Id=5515");

		Thread.sleep(5000);
		URL url = new URL(driver.getCurrentUrl());
		BufferedInputStream fileToParse = new BufferedInputStream(
				url.openStream());

		pdDoc = PDDocument.load(fileToParse);
		pdDoc.getClass();

		if (!pdDoc.isEncrypted()) {

			PDFTextStripperByArea stripper = new PDFTextStripperByArea();
			stripper.setSortByPosition(true);

			PDFTextStripper tStripper = new PDFTextStripper();

			pdfFileInText = tStripper.getText(pdDoc);

		}
		driver.close();
		return pdfFileInText;
	}
}

Hi,
The above code is working fine for PDF read with correct PDF URL
but in my case PDF URL contains blob at the start of URL.
See below screen shot for error. I have passed my URL in your code.

hi,

could you show what you are sending as url parameter?

We are seding PDF Url as parameter. See below Url

blob:https://* URL .com/61416417-e048-43ce-9bf7-2f0d3c1adb49.pdf

Did you find the solution for Blob url

The URL scheme blob: is not supported by the java.net.URL bundled in the OpenJDK as default.
If you want to use the blob:, then you need to configure the java.net.URL class.

The URL scheme data: is not supported as well, and I have ever tried to configure the java.net.URL class to recognize the data: scheme. I was successful. The following post tells what I did:

The same procedure should apply to the blob: scheme as well.

1 Like

Followup.

Sample code is available here: