Bot-Factory 602 Posted April 6, 2014 Report Share Posted April 6, 2014 Hello. I need to scrape two attributes from a HTML text.Namehref I can get both with regex and xpath. That's not the problem. The problem is, that sometimes the href is empty or doesn't exist at all. So let's say the HTML text has10x the name attribute7xhref with an URL2xhref with an empty string1xhref tag is not there at all When I now scrape the attributes into lists, the name list has 10 entries and the href one has 7. But I need to know what belongs to what. So which name attribute is followed by which href. And if it's empty or not there, I want to replace it with "" in my list. How would you approach this? Scraping the data in two steps maybe? Getting the innerhtml of the parent element?And then searching for namen and href in the result?So that I separate them from each other? Or is there another way to do that? Thanks in advance for your helpDan Quote Link to post Share on other sites
UBotBuddy 331 Posted April 6, 2014 Report Share Posted April 6, 2014 @Dan How about this? navigate("http://ubotsandbox.com/ubot-list-example-page-1.php", "Wait") wait for browser event("Everything Loaded", "") wait for element(<href="http://rickpowers.com/">, "", "Appear") set(#var1, $scrape attribute(<id="MyExampleUsers">, "fullhref"), "Global") if($contains(#var1, "ubotsandbox.com")) { then { set(#var1, $replace(#var1, "http://ubotsandbox.com/ubot-list-example-page-1.php", "No Link Found"), "Global") } else { } } set(#var2, $scrape attribute(<id="MyExampleUsers">, "innertext"), "Global") clear list(%List1) clear list(%List2) add list to list(%List1, $list from text(#var1, " "), "Don\'t Delete", "Global") add list to list(%List2, $list from text(#var2, " "), "Delete", "Global") set(#var1, $nothing, "Global") set(#var2, $nothing, "Global") Quote Link to post Share on other sites
Bot-Factory 602 Posted April 6, 2014 Author Report Share Posted April 6, 2014 In my case the same element is 30 times on that site. So when I scrape the attribute, it will return 30 results. In my case I'm using xpath for that because I'm not using the browser at all. After I have the 30 elements in a variable I add it to a list. So I now have all the elements in a list. I then loop through that list and extract the name and href attribute. And if href returns $nothing, I replace it with a placeholder. Watching some of the scarping videos on the training site now. Good stuff by the way :-) Dan Quote Link to post Share on other sites
kev123 132 Posted April 9, 2014 Report Share Posted April 9, 2014 Dan from the sounds of your question (correct me if i'm wrong).Your scraping several different fields where there's a variable amount on the page and you need each line to pair up. Theres a few ways to do it but the easiest and most hassle free is to scrape the parent element and then do a inner scrape of the elements you need this ensures without any doubt that everything matches. Quote Link to post Share on other sites
Bot-Factory 602 Posted April 9, 2014 Author Report Share Posted April 9, 2014 Dan from the sounds of your question (correct me if i'm wrong).Your scraping several different fields where there's a variable amount on the page and you need each line to pair up. Theres a few ways to do it but the easiest and most hassle free is to scrape the parent element and then do a inner scrape of the elements you need this ensures without any doubt that everything matches.That's exactly what I want to do. And as you said the most reliable way is probably to do it in two steps. Scrape the parent element into a list and then run through that list to look for the elements I need. Dan Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.