Read PDF having URL starting with blob

I want to read the content from PDF but i am getting error as unknown protocol blob —
URL - blob:https://*URL.com/61416417-e048-43ce-9bf7-2f0d3c1adb49.pdf

My code-

String validatePDFContent(String PDFURL) throws IOException{
println “--------------------------------PDF reading started----------------------------”
URL TestURL = new URL(PDFURL);
BufferedInputStream bis = new BufferedInputStream(TestURL.openConnection());
PDDocument doc = PDDocument.load(bis);
PDFTextStripper pdfStripper = new PDFTextStripper();
String pdfText = pdfStripper.getText(doc);
println(pdfText);
Assert.assertTrue(pdfText.contains(“Member ID”))
doc.close();
bis.close();
println(“Done”)
}

1 Like

hi,

something wrong with your url
check it before using it

See URL in below screenshot

hi,

I have no idea about url like you have to use
blob:https://* **URL** .com/61416417-e048-43ce-9bf7-2f0d3c1adb49.pdf

what is blob:  ?

can you use without blob:?

here is decribed what blob is

what you can try is first download pdf file and then read it from the download folder

hi,

if you are able to download file then you can read it like this way

import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.text.PDFTextStripper
import org.apache.pdfbox.text.PDFTextStripperByArea;

	
	PDDocument document = PDDocument.load(new File("C:\\Users\\xxx\\Desktop\\BLOB\\49f93860-4d86-49c3-b6ed-003b1d41dda9.pdf"))
	document.getClass();
	if (!document.isEncrypted()) {
		PDFTextStripperByArea stripper = new PDFTextStripperByArea();
		stripper.setSortByPosition(true);
		PDFTextStripper tStripper = new PDFTextStripper();
		String pdfFileInText = tStripper.getText(document);
		// split by whitespace
		List lines = pdfFileInText.split("\\r?\\n");
		for (String line : lines) {
			System.out.println(line);
		}
	}

@Timo_Kuisma

Our requirement is read PDF directly in browser and not to download and read.

Please go through below link to understand what is blob URL- https://stackoverflow.com/questions/30864573/what-is-a-blob-url-and-why-it-is-used.

And let me know if you have any solution to read this BLOB URL PDF

hello boy,

My suggestion was only to read .pdf file from the disk, not the solution!!!
Try using this lib
Apache PDFBox

hi,
yes with apache pdfBox is able to read direct from browser

maybe this is something would help you
but do not anymore say that i am not understand what Blob is and how the Blob is implemented :slight_smile:

TESTCASE

def pdf = CustomKeywords.'com.pdf.reader.ReadPdfFromBrowser.PdfReaderUtil'()

def lines = pdf.split("\\r?\\n");
for (String line : lines) {
	System.out.println(line);
}

KEYWORD

import java.io.BufferedInputStream;
import java.io.File;
import java.io.RandomAccessFile;
import java.net.URL;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.PDFTextStripperByArea;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

public class ReadPdfFromBrowser {

	WebDriver driver;
	PDDocument pdDoc;

	@Keyword
	public String PdfReaderUtil(){
		
		String pdfFileInText = "";

		driver=new ChromeDriver();

		driver.get("http://www.vandevenbv.nl/dynamics/modules/SFIL0200/view.php?fil_Id=5515");

		Thread.sleep(5000);
		URL url = new URL(driver.getCurrentUrl());
		BufferedInputStream fileToParse = new BufferedInputStream(
				url.openStream());

		pdDoc = PDDocument.load(fileToParse);
		pdDoc.getClass();

		if (!pdDoc.isEncrypted()) {

			PDFTextStripperByArea stripper = new PDFTextStripperByArea();
			stripper.setSortByPosition(true);

			PDFTextStripper tStripper = new PDFTextStripper();

			pdfFileInText = tStripper.getText(pdDoc);

		}
		driver.close();
		return pdfFileInText;
	}
}

Hi,
The above code is working fine for PDF read with correct PDF URL
but in my case PDF URL contains blob at the start of URL.
See below screen shot for error. I have passed my URL in your code.

hi,

could you show what you are sending as url parameter?

We are seding PDF Url as parameter. See below Url

blob:https://* URL .com/61416417-e048-43ce-9bf7-2f0d3c1adb49.pdf