PHP: Scrap Web Page Content using Simple HTML DOM Parser

This article shows how you can scrap or fetch any webpage’s HTML content/data in PHP. We will use Simple HTML DOM Parser to scrap webpage content. You can download it from here: PHP Simple HTML DOM Parser – Sourceforge

You need to download the HTML DOM Parser from sourceforge and include simple_html_dom_parser.php in your PHP file.

Here’s the code to scrap full webpage content:


require('simple_html_dom.php');

// Website link to scrap
$website = 'http://www.example.com';

// Create DOM from URL or file
$html = file_get_html($website);

// Print webpage content  
echo $html;

In the below example, we will scrap a webpage and find data content of a particular div class. We then loop through the div class data and fetch all the “a href” link present in it.


require('simple_html_dom.php');

// Website link to scrap
$website = 'http://www.example.com';

// Create DOM from URL or file
$html = file_get_html($website);

// Find content of a div with class = 'xyz'
$divData = $html->find('div[class=xyz]');

// Loop through divData and grab all the links present in it
foreach ($divData as $key => $value) {  
    $links = $value->find('a');
    foreach ($links as $link) {
        $linkHref = $link->href;
        $linkText = $link->plaintext;
    }   
}

More documentation present here.

Hope this helps. Thanks.