This article shows how you can scrap or fetch any webpage’s HTML content/data in PHP. We will use Simple HTML DOM Parser to scrap webpage content. You can download it from here: PHP Simple HTML DOM Parser – Sourceforge
You need to download the HTML DOM Parser from sourceforge and include simple_html_dom_parser.php
in your PHP file.
Here’s the code to scrap full webpage content:
require('simple_html_dom.php');
// Website link to scrap
$website = 'http://www.example.com';
// Create DOM from URL or file
$html = file_get_html($website);
// Print webpage content
echo $html;
In the below example, we will scrap a webpage and find data content of a particular div class. We then loop through the div class data and fetch all the “a href” link present in it.
require('simple_html_dom.php');
// Website link to scrap
$website = 'http://www.example.com';
// Create DOM from URL or file
$html = file_get_html($website);
// Find content of a div with class = 'xyz'
$divData = $html->find('div[class=xyz]');
// Loop through divData and grab all the links present in it
foreach ($divData as $key => $value) {
$links = $value->find('a');
foreach ($links as $link) {
$linkHref = $link->href;
$linkText = $link->plaintext;
}
}
More documentation present here.
Hope this helps. Thanks.