Jump to content
UBot Underground

Recommended Posts

Hi, I am trouble with a scraping script. trying to script a page for information, add the scrape information into a list.. the input of what to scrape comes from a ubot list..so to loop, i create a variable which has #totalrows and then a #row variable that tells ubot which row to insert and then scrape ..individually works fine. but when try to implement using thread and new browser, the threads new process does not increment #row, thus it scrapes the first #row for  #totalrows times  , i have read in other post: http://www.ubotstudio.com/forum/index.php?/topic/15985-insert-into-mysql-table/  and tried to follow , but still having problems

 

 

here is my code, can anyone help , and find my bug.

Edited by cd1168
Link to post
Share on other sites

By quickly checking the code I see 2 problems:

First you need to move code  "increment(#row)" out from "$scrapeFunction()", because you want to process the 2nd row as soon as 1st thread is started, therefor you need to execute increment right after the 1st thread is started, and not after a few other commands are executed.

Second you need to pass #row into "$scrapeFunction()" as parameter, to ensure that it doesn't change in that scope (inside that function/thread). 

 

PS: High speed threading is also broken as proved here: http://www.ubotstudio.com/forum/index.php?/topic/15122-must-read-threading-doesnt-work-as-expected-tested-in-v4/

Link to post
Share on other sites

still a problem...here is what i have.. it only reads the 1st member of the list.  does it matter if define function is above or below in terms of location in script.. should it be below where i call it?   again, thanks for your patience, actually i am quite new to ubot (2-3 days)  and a little overwhelmed at the moment

 

define $scrapeFunction(#row) {
    allow flash("No")
    allow css("No")
    allow images("No")
    set visibility("invisible")
    navigate("https://xxx.com/vin/xx""Wait")
    wait for element(<name="id">"""Appear")
    wait for element(<id="recaptcha_challenge_image">"""Appear")
    if($both($exists(<id="recaptcha_challenge_image">), $exists(<name="id">))) {
        then {
            set(#id$table cell(&idsToProcess#row, 1), "Global")
            add item to list(%ids#id"Delete""Global")
            type text(<name="id">#id"Standard")
            type text(<name="recaptcha_response_field">$solve captcha(<id="recaptcha_challenge_image">), "Standard")
            click(<value="Submit">"Left Click""No")
            wait(1)
            set(#vinScraped$scrape attribute(<tagname="section">"innertext"), "Global")
            add item to list(%idScraped#idScraped"Delete""Global")
            }
    }
    return(%idScraped)
}

 

 

 

set(#VintoProcessCnt$table total rows(&idsToProcess), "Global")
set(#row, 0, "Global")
loop($table total rows(&idsToProcess)) {    
    thread {       
            in new browser {
                                type text(<about me textarea>$scrapeFunction(#row), "Standard")
                                      }
            increment(#row)
                     }
    
    }

Edited by cd1168
Link to post
Share on other sites

As I said, you need to increment #row outside the thread, not inside.

 

Beside that you need another define command around the part where you spawn a new thread, in order for this to work .

 

Here is an example, with your code stripped down with some code added:

clear list(%rows)
define $scrapeFunction(#row) {
    return(#row)
}
set(#row, 0, "Global")
loop(10) {
    THREAD START(#row)
    increment(#row)
    wait(0.5)
}
define THREAD START(#row) {
    thread {
        add item to list(%rows, $scrapeFunction(#row), "Delete", "Global")
        wait(1)
    }
}

Here is a bit more advanced example similarly using THREAD START command: http://www.ubotstudio.com/forum/index.php?/topic/15441-free-plugin-threads-counter-ubot-v4-threading-fixed/

Link to post
Share on other sites

hi, and thank you for the guidance, but still i have some bug somewhere.. it is not reading the correct row from the list that i need.. any leads to where this is i would appreciate 

define $scrapeFunction(#row) {
    allow flash("No")
    allow css("No")
    allow images("No")
    set visibility("invisible")
    navigate("https://vccp.com""Wait")
    wait for element(<name="id">"""Appear")
    wait for element(<id="recaptcha_challenge_image">"""Appear")
    if($both($exists(<id="recaptcha_challenge_image">), $exists(<name="id">))) {
        then {
            set(#VIN$table cell(&idsToProcess#row, 1), "Global")
            add item to list(%ids#id"Delete""Global")
            type text(<name="id">#id"Standard")
            type text(<name="recaptcha_response_field">$solve captcha(<id="recaptcha_challenge_image">), "Standard")
            click(<value="Submit">"Left Click""No")
            wait(1)
            set(#idScraped$scrape attribute(<tagname="section">"innertext"), "Global")
            add item to list(%idScraped#idScraped"Don\'t Delete""Global")
        }
    }
    return(#row)
}

define THREAD START(#row) {
    thread {
        add item to list(%rows$scrapeFunction(%idScraped), "Delete""Global")
        wait(1)
    }
}

set(#idtoProcessCnt$table total rows(&idsToProcess), "Global")
set(#row, 0, "Global")
loop(#idtoProcessCnt) {
            in new browser {
            THREAD START(#row)
            increment(#row)
            wait(0.5)
        }
        
    }
}

Link to post
Share on other sites
yes... but doesn't scrape anything .. no errors, just sits, doesn't finish .. 

 

 

comment("function to scrape")

define $scrapeFunction(#row) {

    allow flash("No")

    allow css("No")

    allow images("No")

    set visibility("invisible")

    navigate("www.com", "Wait")

    wait for element(<name="id">, "", "Appear")

    wait for element(<id="recaptcha_challenge_image">, "", "Appear")

    if($both($exists(<id="recaptcha_challenge_image">), $exists(<name="id">))) {

        then {

            set(#id, $table cell(&idsToProcess, #row, 1), "Global")

            add item to list(%ids, #id, "Delete", "Global")

            type text(<name="id">, #id, "Standard")

            type text(<name="recaptcha_response_field">, $solve captcha(<id="recaptcha_challenge_image">), "Standard")

            click(<value="Submit">, "Left Click", "No")

            wait(1)

            set(#idScraped, $scrape attribute(<tagname="section">, "innertext"), "Global")

            add item to list(%idScraped, #idScraped, "Don\'t Delete", "Global")

        }

    }

    return(#row)

}

define THREAD START(#row) {

    thread {

        add item to list(%rows, $scrapeFunction(%idScraped), "Delete", "Global")

        wait(1)

    }

}

 

set(#idtoProcessCnt, $table total rows(&idsToProcess), "Global")

set(#row, 0, "Global")

loop(#idtoProcessCnt) {

    in new browser {

        THREAD START(#row)

        increment(#row)

        wait(0.5)

    }

}
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...