I want to read the content from PDF but i am getting error as unknown protocol blob —
URL - blob:https://*URL.com/61416417-e048-43ce-9bf7-2f0d3c1adb49.pdf
My code-
String validatePDFContent(String PDFURL) throws IOException{
println “--------------------------------PDF reading started----------------------------”
URL TestURL = new URL(PDFURL);
BufferedInputStream bis = new BufferedInputStream(TestURL.openConnection());
PDDocument doc = PDDocument.load(bis);
PDFTextStripper pdfStripper = new PDFTextStripper();
String pdfText = pdfStripper.getText(doc);
println(pdfText);
Assert.assertTrue(pdfText.contains(“Member ID”))
doc.close();
bis.close();
println(“Done”)
}
if you are able to download file then you can read it like this way
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.text.PDFTextStripper
import org.apache.pdfbox.text.PDFTextStripperByArea;
PDDocument document = PDDocument.load(new File("C:\\Users\\xxx\\Desktop\\BLOB\\49f93860-4d86-49c3-b6ed-003b1d41dda9.pdf"))
document.getClass();
if (!document.isEncrypted()) {
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
PDFTextStripper tStripper = new PDFTextStripper();
String pdfFileInText = tStripper.getText(document);
// split by whitespace
List lines = pdfFileInText.split("\\r?\\n");
for (String line : lines) {
System.out.println(line);
}
}
Hi,
The above code is working fine for PDF read with correct PDF URL
but in my case PDF URL contains blob at the start of URL.
See below screen shot for error. I have passed my URL in your code.
The URL scheme blob: is not supported by the java.net.URL bundled in the OpenJDK as default.
If you want to use the blob:, then you need to configure the java.net.URL class.
The URL scheme data: is not supported as well, and I have ever tried to configure the java.net.URL class to recognize the data: scheme. I was successful. The following post tells what I did:
The same procedure should apply to the blob: scheme as well.