Why Am I Not Scraping Href Attribute

webtrend · March 26, 2015

Here is the relevant html code:

<div class="mctitle"><a target="_blank" href="http://twicsy.com/" rel="nofollow">Celebrities That Go Braless</a></div>

The following ubot block is yielding me a blank list with 22 items (although the list count of 22 is accurate)

$Scrape_attribute

Element to Scrape: <class="mctitle">

Attribute to Scrape: href

Any ideas?

deliter · March 26, 2015

set(#links,$scrape attribute(<target="_blank">,"href"),"Global")

Ive just loaded that link on a blank html page,if the above code scrapes more than the links you need,maybe adjust to target=blank and class= etc

LordFrz · March 26, 2015

Try attribute 'fullhref'

webtrend · March 26, 2015

thanks for the response guys.

target="_blank" does grab those links but with many others that I don't want. Adding a boolean AND condition with class="mctitle" does not give me any results

fullhref is giving me links that I don't even know where it is pulling from.

Looks like I will just have to use regex. Is there any other way around it?

Bot-Factory · March 26, 2015

Can you please share the full code? So the URL from where you try to scrape that <class="mctitle">.

If you can share that, I'll take a look.

Dan

webtrend · March 27, 2015

Can you please share the full code? So the URL from where you try to scrape that <class="mctitle">.
If you can share that, I'll take a look.
Dan

Problem solved, please look at my next post.

Thanks, Dan for your offer to help

Edited March 27, 2015 by webtrend

LordFrz · March 27, 2015

I dont know, you might try scraping the innerhtml, then look for the regular expression between the start of the href and the ending quotations.

Edited March 27, 2015 by LordFrz

webtrend · March 27, 2015

^^^ What you are recommending works but it is scraping a lot more URLs than the other lists. I want all the 3 lists to be consistent so that each numbered item on a given list corresponds to the same numbered items on the other lists.

I would like to stay within that class and scrape the href attribute of the class. The html document seems to be well structured so I don't understand why this simple command is not working.

webtrend · March 27, 2015

nevermind, I figured out the solution.

You have to use $element_child block to read the "a" tag and then choose "href" as the attribute.

like this:

add list to list(%links,$scrape attribute($element child(<class="mctitle">),"href"),"Don\'t Delete","Global")

Edited March 27, 2015 by webtrend

Sign In

Why Am I Not Scraping Href Attribute

Recommended Posts

webtrend 1

Link to post

Share on other sites

deliter 203

Link to post

Share on other sites

LordFrz 3

Link to post

Share on other sites

webtrend 1

Link to post

Share on other sites

Bot-Factory 602

Link to post

Share on other sites

webtrend 1

Link to post

Share on other sites

LordFrz 3

Link to post

Share on other sites

webtrend 1

Link to post

Share on other sites

webtrend 1

Link to post

Share on other sites

Join the conversation

Browse

Activity