How to find the inaccessible links on a webpage

Hi there,

I need to test 100-300 links on a webpage. I have written a code wherein it finds out the number of links on a webpage correctly and using “verifyAllLinksOnCurrentPageAccessible” function am able to get the result true/false upon pass/fail respectively. however once it fails, I want to know the links which failed to get response. I checked the log viewer it does not give the details.

Can anyone please help me with this ?

Below is the output after execution

The links shown in the above picture gets the response code and some other information but not the urls which have failed
One of the .har file has 302 response code
image

Sorry, I don’t have a straightforward answer. But maybe the following might help.

302, to give it its full name, is HTTP 302 Found. However, in practice it typically means, Found, but I was redirected to another URL. That URL should be in the response header somewhere. More here.

If there is no readily consumable API to get what you need…

  1. You could use metaprogramming to hijack the WebUI method(s) and deal with the fails yourself. I think @kazurayam might be best to advise us here.

  2. The har files are JSON. You could read them yourself and figure out which are pass/fail.

HTH.

@Russ_Thomas Thanks for your response!
@kazurayam
The Example I have taken has total of 94 links and there are 15 .har files generated in the log viewer out of which only one file has the response code HTTP 302
Also I manually checked all the urls, out of 94 urls, 7 are having issues

Below is the response received, url: https://uat.example.com/ErrorPage.htm?IsError=1(example:name changed), is not one of the urls which is failing
{
“log” : {
“version” : “1.2”,
“pages” : [ ],
“entries” : [ {
“startedDateTime” : “2020-10-29T10:41:38.386Z”,
“request” : {
“method” : “GET”,
“url” : “https://uat.example.com/learning/”,
“httpVersion” : “”,
“cookies” : [ ],
“headers” : [ ],
“queryString” : [ ],
“headersSize” : 0,
“bodySize” : 0,
“comment” : “”
},
“response” : {
“status” : 302,
“statusText” : “”,
“httpVersion” : “”,
“cookies” : [ ],
“headers” : [ {
“name” : “X-Frame-Options”,
“value” : “SAMEORIGIN”
}, {
“name” : “X-Frame-Options”,
“value” : “SAMEORIGIN”
}, {
“name” : “Strict-Transport-Security”,
“value” : “max-age=16070400”
}, {
“name” : “#status#”,
“value” : “HTTP/1.1 302 Redirect”
}, {
“name” : “Content-Length”,
“value” : “173”
}, {
“name” : “Date”,
“value” : “Thu, 29 Oct 2020 10:41:38 GMT”
}, {
“name” : “Content-Type”,
“value” : “text/html; charset=UTF-8”
}, {
“name” : “Location”,
“value” : “https://uat.example.com/ErrorPage.htm?IsError=1”
} ],
“content” : {
“size” : 173,
“mimeType” : “text/html; charset=UTF-8”,
“text” : “Document Moved\n

Object Moved

This document may be found <a HREF=“https://uat.example.com/ErrorPage.htm?IsError=1”>here”,
“comment” : “”
},
“redirectURL” : “”,
“headersSize” : 235,
“bodySize” : 173,
“comment” : “”
},
“cache” : { },
“timings” : {
“comment” : “”,
“ssl” : -1,
“connect” : 36,
“receive” : 0,
“wait” : 0,
“blocked” : -1,
“send” : 0,
“dns” : -1
},
“comment” : “”,
“_katalonRequestInformation” : {
“name” : “7”,
“testObjectId” : “Temporary RESTful request object”,
“harId” : “135a2ff9-f5a4-48eb-b089-2bd8b69a4dba”,
“reportFolder” : null
},
“time” : 36
} ],
“comment” : “”
}
}

If I have to use metaprogramming here, could you please advise me on how I can perform it since am new to the programming . An example would be appreciated!

I have made a github project where you will find my sample code.

Please find TC1 and read the source code.

Points to note:

  1. You can use Katalon’s WebUI keywords and WebService keywords mixed in a test case.
  2. com.kms.katalon.core.testobject.ResponseObject implements various getter methods to read the HTTP Response object. E.g, getStatusCode() and getHeaderField() . Those methods enable you to inspect the Response in detail.

I did not need Groovy’s Meta-programming technique at all.


Russ_Thomas suggested another approach: read JSON files as HAR to find StatusCode and Headers. This approach can achieve the same outcome as my demo.

@Sdhongadi

Both of Russ’ 2 approaches are valid, will result in similar outcome. Try either way. However you need to get seasoned programming skill anyway.

@kazurayam Thanks for your response! I will try to look into the code and Github and work on it, will get back to you if any help required

Meanwhile the approach which Russ_Thomas suggested, i tried to read the .har files generated but it does not contain the urls which are failing to get response(as mentioned in the above example)

The Example I have taken has total of 94 links and there are 15 .har files generated in the log viewer out of which only one file has the response code HTTP 302. Will it not generate one .har file for each link?
Also I manually checked all the urls, out of 94 urls, 7 are having issues however only one har file is having 302 status code. does it mean only one har file generated for all the failures?

Below is the response received, url: https://uat.example.com/ErrorPage.htm?IsError=1(example:name changed), is not one of the urls which is failing
{
“log” : {
“version” : “1.2”,
“pages” : [ ],
“entries” : [ {
“startedDateTime” : “2020-10-29T10:41:38.386Z”,
“request” : {
“method” : “GET”,
“url” : “https://uat.example.com/learning/”,
“httpVersion” : “”,
“cookies” : [ ],
“headers” : [ ],
“queryString” : [ ],
“headersSize” : 0,
“bodySize” : 0,
“comment” : “”
},
“response” : {
“status” : 302,
“statusText” : “”,
“httpVersion” : “”,
“cookies” : [ ],
“headers” : [ {
“name” : “X-Frame-Options”,
“value” : “SAMEORIGIN”
}, {
“name” : “X-Frame-Options”,
“value” : “SAMEORIGIN”
}, {
“name” : “Strict-Transport-Security”,
“value” : “max-age=16070400”
}, {
“name” : “#status#”,
“value” : “HTTP/1.1 302 Redirect”
}, {
“name” : “Content-Length”,
“value” : “173”
}, {
“name” : “Date”,
“value” : “Thu, 29 Oct 2020 10:41:38 GMT”
}, {
“name” : “Content-Type”,
“value” : “text/html; charset=UTF-8”
}, {
“name” : “Location”,
“value” : “https://uat.example.com/ErrorPage.htm?IsError=1”
} ],
“content” : {
“size” : 173,
“mimeType” : “text/html; charset=UTF-8”,
“text” : “Document Moved\n

Object Moved

This document may be found <a HREF=“https://uat.example.com/ErrorPage.htm?IsError=1”>here”,
“comment” : “”
},
“redirectURL” : “”,
“headersSize” : 235,
“bodySize” : 173,
“comment” : “”
},
“cache” : { },
“timings” : {
“comment” : “”,
“ssl” : -1,
“connect” : 36,
“receive” : 0,
“wait” : 0,
“blocked” : -1,
“send” : 0,
“dns” : -1
},
“comment” : “”,
“_katalonRequestInformation” : {
“name” : “7”,
“testObjectId” : “Temporary RESTful request object”,
“harId” : “135a2ff9-f5a4-48eb-b089-2bd8b69a4dba”,
“reportFolder” : null
},
“time” : 36
} ],
“comment” : “”
}
}

Then what is the right URL? Don’t you know it?

@kazurayam
As I mentioned I have found the incorrect urls manually but I want the script to populate the urls having issue.

When I run my script, the verifyAllLinksOnCurrentPageAccessible flag is set to false since some of the urls are broken but I want to also list all the urls which are broken (as I am having 100-300 urls and cannot check it everytime manually hence writing the automation script)

You have not shared your test case code. So I see nothing meaningful about your test case.

All I guess …

In my example, an invokation of WS.sendRequest() results a .har file. The TC1 scripts makes 14 times of WS.sendRequest() calls and results 14 .har files. I suppose that .har files will created regardless which HTTP response status code (200, 302, or any other) was returned.

You wrote that you got 15 .har files. This makes me guess that your test case actually made only 15 times of WS.sendRequest call despite you expect a lot more (94 files).

Why? — I don’t know. Only your source code will tell us.

1 Like

You used the keyword WebUI Verify All Links On Current Page Accessible because you expected that the keyword solve your problem auto-magically, right?

I think that this keyword is poorly documented. It does not describe how it behaves when 1 or more links are broken or responded with StatusCode other than 200 (OK). I do not understand how this keyword would behave at all.

I will never use this keyword if I want to check the accessibility of the links in my target webpages and to find out which URL is broken or is redirected. The approach I took in my sample project can solve the problem though you need to write a lengthy Groovy script.

@kazurayam
Here is the code I wrote

WebElement element = WebUiCommonHelper.findWebElement(findTestObject(‘Object Repository/Resources/Page_Identifi - Resources/allrequiredlinks’),30 )

List linkList = new ArrayList()

linkList= element.findElements(By.tagName(“a”))

linkList.addAll(element.findElements(By.tagName(“img”)))

List finalList = new ArrayList()

for(WebElement E : linkList)
{
if(E.getAttribute(“href”)!=null)
{
finalList.add(E)

	}

}
println("Total number of links on the Page: "+finalList.size())

boolean f = WebUI.verifyAllLinksOnCurrentPageAccessible(false, finalList)

if (f==1)
{
println(“All links on the page are accessible”)
}
else
{
println(“Some of the links on the page are not accessible”)
}

Could you enclose your code with triple back-ticks for code formatting? like

スクリーンショット 2020-10-30 21.56.26

sure!


WebElement  element = WebUiCommonHelper.findWebElement(findTestObject('Object Repository/Resources/Page_Identifi - Resources/allrequiredlinks'),30 )

List linkList = new ArrayList()

linkList= element.findElements(By.tagName("a"))

linkList.addAll(element.findElements(By.tagName("img")))

List finalList = new ArrayList()

for(WebElement E : linkList)
{
	if(E.getAttribute("href")!=null)
		{
			finalList.add(E)
			
		}
}
println("Total number of links on the Page: "+finalList.size())

boolean  f = WebUI.verifyAllLinksOnCurrentPageAccessible(false, finalList)

if (f==1)
{
	println("All links on the page are accessible")
}
else
{
	println("Some of the links on the page are not accessible")
}

Here you specify the finalList variable as the 2nd argument to the Verify All Links On Current Web Page keyword.

Please check the document carefully. The 2nd argument is described as excludedLinks. I suppose, this is just opposite to your intention. You want the finalList to be included, don’t you?

This code fragment looks odd. You should rather write:

boolean f = ...
if (f)

@kazurayam Thank you ! Re-ran the script after incorporating the changes. Please find below details
It is still producing only 15 .har files and now the flag is set to true

could you please help on how to find broken links on the page?

Please share your latest script source.

WebElement  element = WebUiCommonHelper.findWebElement(findTestObject('Object Repository/Resources/Page_Identifi - Resources/allrequiredlinks'),30 )

List linkList = new ArrayList()

linkList= element.findElements(By.tagName("a"))

linkList.addAll(element.findElements(By.tagName("img")))

List finalList = new ArrayList()

for(WebElement E : linkList)
{
	if(E.getAttribute("href")!=null)
		{
			finalList.add(E)
			
		}
}
println("Total number of links on the Page: "+finalList.size())

Boolean f =WebUI.verifyAllLinksOnCurrentPageAccessible(false, [])


if (f)
{
	println("All links on the page are accessible")
}
else
{
	println("Some of the links on the page are not accessible")
}

What do you mean by saying broken links ?

If a URL is responded with HTTP StatusCode 302, is it broken for you or not ?

I would regard a link is broken when a HTTP Request is responded with any HTTP Status code other than 200, or if the request is not responded at all. I have already showed a solution to pick up such broken links. For my idea, please have a look at it.
kazurayam/HowToFindMetadataOfLinksInAWebPage

@Sdhongadi

These 2 lines indicate that you want all of <img> elements in the page to be checked. Naively you assume that the keyword WebUI.veirfyAllLinksOnCurrentPageAccessible includes <img> elements. But I doubt it.

Though I am not sure. The documentation does not tell if the keyword includes <img> or not. How about checking the source code? The verifyAllLinksOnCurrentPageAccessible keyword’s source is published at
https://github.com/katalon-studio/katalon-studio-testing-framework/blob/master/Include/scripts/groovy/com/kms/katalon/core/webui/keyword/builtin/VerifyAllLinksOnCurrentPageAccessibleKeyword.groovy. However I could not find the source code portion which tells if the keyword checks <img> elements or not. It seems that the portion is unpublished. I have got no clue.

Just for your interest: