Jump to content
UBot Underground

Multi-Thread & HTTP GET


Recommended Posts

Google tends to frown upon hitting them to quickly when you are using http get to scrape code (well, for that matter they don't like it when you use browser either - but the ban seems to come faster with http get).

 

Anyway - I was just wondering if you hit them with all of the requests at the exact same time, if that would get you flagged just as quickly, or if because they hit all at once, it's fast enough to skip the trigger for that run.

 

So I was thinking about doing a multi-thread just to grab the code from the target page for all pages at once.

 

It's a total of 40 pages ... So open 40 threads all at once to grab the code.

 

Anyone tried anything similar - If so, was the results still the same ban-hammer, or did you get by with it?

Link to post
Share on other sites

Google tends to frown upon hitting them to quickly when you are using http get to scrape code (well, for that matter they don't like it when you use browser either - but the ban seems to come faster with http get).

 

Anyway - I was just wondering if you hit them with all of the requests at the exact same time, if that would get you flagged just as quickly, or if because they hit all at once, it's fast enough to skip the trigger for that run.

 

So I was thinking about doing a multi-thread just to grab the code from the target page for all pages at once.

 

It's a total of 40 pages ... So open 40 threads all at once to grab the code.

 

Anyone tried anything similar - If so, was the results still the same ban-hammer, or did you get by with it?

 

That doesn't work. The technology is intelligent enough to detect stuff like that. Requests are never coming in at the same time. You might send the TCP packages within multiple threads. 

But you still have a single internet connection where all the packages are processed sequentially. 

 

So, a typical TCP/IP host has multiple processes each needing to send and receive datagrams. All of them, however, must be sent using the same interface to the network, using the IP layer. This means that the data from all applications is “funneled down”, initially to the transport layer, where it is handled by either TCP or UDP. From there, messages pass to the device's IP layer, where they are packaged in IP datagrams and sent out over the internet to different destinations. The technical term for this is multiplexing. This term simply means combining, and its use here is a software analog to the way it is done with signals.

 

So that way it can still be detected.

 

Proxies are the only way to work around the issue you described. Well.. solving the google captcha might work as well to some degree.

 

Dan

 

 

portsmultiplexing.png

  • Like 1
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...