How To: Find all links on a page with PHP
While working on a project, I needed to find all links on a given page. This code will list all links specified in an anchor tag on a given page URL.
$html = file_get_contents( '//website.com/page-in-question' ); $dom = new DOMDocument(); @$dom->loadHTML( $html ); $xpath = new DOMXPath( $dom ); $hrefs = $xpath->evaluate( "/html/body//a" ); for( $i = 0; $i < $hrefs->length; $i ++ ) { $href = $hrefs->item( $i ); $url = $href->getAttribute( 'href' ); echo $url . ' '; }
Once we have the list of the links, we can do whatever we want to do. In my case, I had to check if any of the links are broken on a WordPress page so I wrote a custom WordPress plugin for a client which checks first grab the links on a page and then check if the response status is 200 or 400 via wp_remote_get() call.
Another task was to find all images on a page, check the file size and if it’s large then crop the image and replace it with the new and improved version.
We can modify the above code to grab all image URLs on a page. All we need to do is change the DOM element from:
$hrefs = $xpath->evaluate( "/html/body//a" );
to
$hrefs = $xpath->evaluate( "/html/body//img" );
and
$url = $href->getAttribute( 'href' );
to
$url = $href->getAttribute( 'src' );
and we will get all the links in the src attribute of the images used on the page. Once I have the URLs, I used the PHP filesize function to determine the size and then wrote a script to crop the image, reduce file size and replace the same in its location.
I hope this code will help you if you are working on a similar task.
Jump over to the link to know more about PHP Development Services.
Leave a Reply