Link and Content Ripper(Scraper)Utility Script Tutorial
Da Rippa. For scraping content. A webmasters best friend.
Hi!. Welcome to Da Rippa content and link ripping Php utility script.
This is a must have weapon in your arsenal of utility scripts for your websites.
If the content has been done before why reinvent the wheel? Just rip it and share it!
You are providing a benificial service to everyone.
Let us now get started.
This script is intended as shown below to be used as an include in a page you want the
ripped content to be displayed. Change the script file name and content write to file
to include in multiple pages for different content.
To rip content we will be looking for certain tags this content is
enclosed in from a webpages source code.
So we have to know what tags and coding format the webpage is in.
So I suggest to copy and paste the source code from the website to a file first
then go through the code to look for the tags the content you want is enclosed in.
Hopefully it is consistant. But could change with a new webdesigner for the site.
Like enclosing links in single quotes or double etc.
We then set our rippa script opening and closing tag variables for our preg match.
We set the webpage URL for curl to get the content.
And we enable the setting to add the url to the links if they are relative.
See script comments for info on the link str_replace and using base url in header.
Now the Magic
- We start the script by checking if we have a text file with the content from the page we are ripping from.
- If not we rip the webpage of our choice source code using curl or fsock open.
- If we already have a content file we rerip from the file.
- We check the content file for last modified timestamp on file.
- if the file is older than our setting. (1 week in script setting)
- We rip the content again from the website and load our text file.
- So we now can rip the content and or links from our text file.
- Using preg_match_all we get our content array and process.
- If we are ripping links we str_replace href with href(link).
- If not links we just display content.
- Echo the content in a styled division in your webpage.
That is it for now. See the script for comments it explains it in detail.
Hope you like this tutorial example and find it useful! Happy Content Ripping!
Introducing DaRippa!
Please donate and help the handicapped.