I want to read the content from PDF but i am getting error as unknown protocol blob —
URL - blob:https://*URL.com/61416417-e048-43ce-9bf7-2f0d3c1adb49.pdf
My code-
String validatePDFContent(String PDFURL) throws IOException{
println “--------------------------------PDF reading started----------------------------”
URL TestURL = new URL(PDFURL);
BufferedInputStream bis = new BufferedInputStream(TestURL.openConnection());
PDDocument doc = PDDocument.load(bis);
PDFTextStripper pdfStripper = new PDFTextStripper();
String pdfText = pdfStripper.getText(doc);
println(pdfText);
Assert.assertTrue(pdfText.contains(“Member ID”))
doc.close();
bis.close();
println(“Done”)
}
if you are able to download file then you can read it like this way
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.text.PDFTextStripper
import org.apache.pdfbox.text.PDFTextStripperByArea;
PDDocument document = PDDocument.load(new File("C:\\Users\\xxx\\Desktop\\BLOB\\49f93860-4d86-49c3-b6ed-003b1d41dda9.pdf"))
document.getClass();
if (!document.isEncrypted()) {
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
PDFTextStripper tStripper = new PDFTextStripper();
String pdfFileInText = tStripper.getText(document);
// split by whitespace
List lines = pdfFileInText.split("\\r?\\n");
for (String line : lines) {
System.out.println(line);
}
}
Hi,
The above code is working fine for PDF read with correct PDF URL
but in my case PDF URL contains blob at the start of URL.
See below screen shot for error. I have passed my URL in your code.