jonemd 0 Posted April 7, 2014 Report Share Posted April 7, 2014 Hi Guys, I'm a complete noob with an issue; I have created a bot that is starting to do what I want it to but I've now come up against a brick wall that I can't get over despite looking for the answer everywhere. Issue is this... I scrape a webpage for "fullhref" which returns all links as required in to a list. What I need to do now is copy URLs that meet a certain criteria to a new list example: First List Returnswww.example.comwww.example.com/1www.example.com/2www.example.com/3www.example.com/video/1www.example.com/video/2www.example.com/video/3www.example.com/end I now want to get the links that contain "www.example.com/video" in to a new list. I'm aware this is probably really basic and simple for most of you but I'm struggling! If someone could help me not only would I greatly appreciate it but you might just save my laptop from heading out of the window!! Thanks Jonathan Quote Link to post Share on other sites
ds062692 19 Posted April 7, 2014 Report Share Posted April 7, 2014 Loop through the list and check each list item for conditions that you want. For example, not actually code. Loop list total (scraped url list)Set current list item to next list item in scraped url listIf current list item contain x, y, zAdd current list item to new list Quote Link to post Share on other sites
jonemd 0 Posted April 7, 2014 Author Report Share Posted April 7, 2014 Thanks for the reply. While I understand the logic needed to solve this issue I'm unsure of how to code it (even in node view) I just can't find the right commands. Loop through the list and check each list item for conditions that you want. For example, not actually code. Loop list total (scraped url list)Set current list item to next list item in scraped url listIf current list item contain x, y, zAdd current list item to new list Quote Link to post Share on other sites
ds062692 19 Posted April 7, 2014 Report Share Posted April 7, 2014 add list to list(%urls, $scrape attribute(<href="">, "href"), "Delete", "Global") loop($list total(%urls)) { set(#current list item, $next list item(%urls), "Global") if($contains(#current list item, "/video/")) { then { add item to list(%good urls, #current list item, "Delete", "Global") } else { } } } Quote Link to post Share on other sites
Steve 30 Posted April 7, 2014 Report Share Posted April 7, 2014 You could also avoid having to do that by scraping only the URL's that you need... For example, if you needed only URLs with "/video/" you could try something with wildcards like this: add list to list(%urls, $scrape attribute(<outerhtml=w"<a href=\"*/videos/*\"*>*</a>">, "outertext"), "Delete", "Global") Quote Link to post Share on other sites
jonemd 0 Posted April 8, 2014 Author Report Share Posted April 8, 2014 Hi guys, thanks to both of your for your replies. For the record, I ended up going with the first option primarily because there are other URLs it would be handy to keep on that list. Incedently, I couldn't get your code to work Steve. Is there anything I can post that would help you to help me?? Thanks again Quote Link to post Share on other sites
Steve 30 Posted April 8, 2014 Report Share Posted April 8, 2014 the page of the url's you are trying to scrape would help... Quote Link to post Share on other sites
jonemd 0 Posted April 8, 2014 Author Report Share Posted April 8, 2014 as requested... http://www.xvideos.com/c/Amateur-17 <-- **DISCLAIMER** most definitely NSFW and please don't visit if easily offended by adult videos Thanks Quote Link to post Share on other sites
Steve 30 Posted April 8, 2014 Report Share Posted April 8, 2014 I'm guessing you want links to the video pages. The following worked for me: add list to list(%urls, $scrape attribute(<outerhtml=w"<a href=\"/video*/*>*</a>">, "fullhref"), "Delete", "Global") Quote Link to post Share on other sites
jonemd 0 Posted April 8, 2014 Author Report Share Posted April 8, 2014 Thanks Steve, I really appreciate your help. I'll run the code tomorrow and let you know how it goes! Quote Link to post Share on other sites
Steve 30 Posted April 8, 2014 Report Share Posted April 8, 2014 No prob. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.