Bot-Factory 602 Posted April 11, 2014 Report Share Posted April 11, 2014 Hello. I'm working on a scraping bot (http post plugin).But I need to scrape a lot of entries. Scraping and extracting the data works fine. But I have some challenges with the ubot performance and memory consumption. Basically I run through multiple loops. The first run will extract 1500 URLs into a list. The next one will extract 20-40 Urls for every URL in that list. So that will result in 30000-60000 Entries.And already after 5000 entries it get's very squishy... How do you normally handle such large amount of data? Do you store that on the disk in multiple files? Should I save the lists into a SQLite DB and load the stuff from there? Would love to hear your best practice tips. Thanks in advance for your helpDan Quote Link to post Share on other sites
kev123 132 Posted April 11, 2014 Report Share Posted April 11, 2014 I go SQLite for the saving and write a define that's saves after a certain size. its important to get this right as it can slow down your bot. A simple way would be to save you initial scrape to a file using advanced file read part file and append to file after your second scrape gets to a certain size. I'm also half way through a plugin for large tables which will have all the features of normal tables but can handle a lot of data. Quote Link to post Share on other sites
Bot-Factory 602 Posted April 11, 2014 Author Report Share Posted April 11, 2014 I go SQLite for the saving and write a define that's saves after a certain size. its important to get this right as it can slow down your bot. A simple way would be to save you initial scrape to a file using advanced file read part file and append to file after your second scrape gets to a certain size. I'm also half way through a plugin for large tables which will have all the features of normal tables but can handle a lot of data. I'm currently playing around with SQLite. Need to write a define to write a list of 500 entries into the database. But not all into one cell. I need each item as a separate entry. But when I loop through the list and do a INSERT INTO for each item, that's way to slow. Do you know a smarter way to get a huge list into the sqlite db? Can you share some more details about the plugin you are working on? The biggest challenge is memory consumption for me at the moment. The bot is using like 500mb already. And it only has 3000 entries in a list :-( Dan Quote Link to post Share on other sites
stanf 43 Posted April 11, 2014 Report Share Posted April 11, 2014 Hurry kev.hurry Quote Link to post Share on other sites
Bot-Factory 602 Posted April 11, 2014 Author Report Share Posted April 11, 2014 Ok I now can add up to 500 (SQLite limit) entries in one go: The SQLite syntax for that is:INSERT INTO 'tablename'SELECT 'data1' AS 'column1', 'data2' AS 'column2'UNION SELECT 'data3', 'data4'UNION SELECT 'data5', 'data6'UNION SELECT 'data7', 'data8' Here's my define to create that:define CreateSQLAdd { set list position(%urls, 0) set(#sqlcommand, "INSERT INTO \'data1\'", "Global") set(#sqlcommand, "{#sqlcommand}SELECT \'{$next list item(%urls)}\' AS \'urls\'", "Global") loop($subtract($list total(%urls), 1)) { set(#sqlcommand, "{#sqlcommand}UNION SELECT \'{$next list item(%urls)}\'", "Global") }} If you need to add more than 500 items, you have to split it. Dan Quote Link to post Share on other sites
kev123 132 Posted April 11, 2014 Report Share Posted April 11, 2014 why not use the table insert the command if you want to add over 500 in one go and add your list as a row. its lighting fast.i'm going to ask aymen if the table insert could have options for update etc. regarding the plugin all functions of normal tables (probably not at launch as it seem people want it now),can hold stupid amounts of data without a hitch, will be free just waiting on the key, approver at support to be back of vacation. Any features let me know. 1 Quote Link to post Share on other sites
Bot-Factory 602 Posted April 11, 2014 Author Report Share Posted April 11, 2014 why not use the table insert the command if you want to add over 500 in one go and add your list as a row. its lighting fast.i'm going to ask aymen if the table insert could have options for update etc. regarding the plugin all functions of normal tables (probably not at launch as it seem people want it now),can hold stupid amounts of data without a hitch, will be free just waiting on the key, approver at support to be back of vacation. Any features let me know. Hmm.. the insert table command probably won't work. I have to add 200 thousand entries to the database. And the add table command can't update the database. It always overwrites it. So I would need to add 200 thousand entries to a table. Which probably will kill ubot :-) So I scrape 500 entries. Add them to a list. Convert that to a SQL statement. Add it to DB and clear the list. And then I start over. That's what I'm currently working on. Will let you know if it works after it's done :-) Quote Link to post Share on other sites
kev123 132 Posted April 11, 2014 Report Share Posted April 11, 2014 sorry I missed a bit of information do it in chunks. inserting 1000-2000 rows at a time. Its what I do and works very well. if your looking to get above 500 records at once. Quote Link to post Share on other sites
Bot-Factory 602 Posted April 11, 2014 Author Report Share Posted April 11, 2014 regarding the plugin all functions of normal tables (probably not at launch as it seem people want it now),can hold stupid amounts of data without a hitch, will be free just waiting on the key, approver at support to be back of vacation. Any features let me know. How's your plugin working when I add 50.000 entries to the table? How much memory is required for that? With the native ubot lists / tables that's almost impossible. Well at least in Ubot studio. haven't tested if that changes when I compile the bot. It's my first bot where I need to process more than 1000 urls in one go :-/ The final goal is to extract 3 Million URLs... and somehow store them so that they can be used later. Still not 100% sure how to approach that... Dan Quote Link to post Share on other sites
kev123 132 Posted April 11, 2014 Report Share Posted April 11, 2014 i'll check memory now. i'll also maximise the process by adding to the table column in a loop rather than in one go Quote Link to post Share on other sites
Bot-Factory 602 Posted April 11, 2014 Author Report Share Posted April 11, 2014 I just tested my bot with v5. And I must say.. It's much better in terms of memory management. v4 was using 600mb already with 4000 entries in a table. With v5 I just added 20000 entries and it was still at 220mb ram. And the UI was still very responsive. They only think you shouldn't do is open the table in the debugger with the plus signThat will instantly kill ubot. Bäm 1.2GB ram and app is frozen :-) Dan Quote Link to post Share on other sites
kev123 132 Posted April 11, 2014 Report Share Posted April 11, 2014 50,000 was so minor I couldn't tell if it was ubot using the memory or the table.I went to 500000 and the memory while adding went to over a gig. I think this was because I was looping half a million times with no other actions apart from setting the table cells which in any program even ubot is unlikely and memory cpu heavy. I carried out a memory clear and currently with half a million records inside ubot is sitting at under 200mb. I have carried out several actions in the browser and this hasn't increased a lot. Quote Link to post Share on other sites
Bot-Factory 602 Posted April 11, 2014 Author Report Share Posted April 11, 2014 50,000 was so minor I couldn't tell if it was ubot using the memory or the table.I went to 500000 and the memory while adding went to over a gig. I think this was because I was looping half a million times with no other actions apart from setting the table cells which in any program even ubot is unlikely and memory cpu heavy. I carried out a memory clear and currently with half a million records inside ubot is sitting at under 200mb. I have carried out several actions in the browser and this hasn't increased a lot. Very interesting. Would love to test that.Does the plugin support:$table cell$table total columns$table total rowsset table cellclear table Those would be Prio1 features in my oppinion.Followed by:add list to table as rowadd list to table as column Nice work Kev! Quote Link to post Share on other sites
kev123 132 Posted April 11, 2014 Report Share Posted April 11, 2014 Yeah of course all the standard stuff and anything people can think of. Two things to note it doesn't show values in debugger, api doesn't allow this. The table will be the size you specify for example how many rows columns, I could make it auto calculate like ubots table but this would make it more bulky and the whole point is storing large data. Obviously reading from file you wouldn't need to specify Quote Link to post Share on other sites
Bot-Factory 602 Posted April 12, 2014 Author Report Share Posted April 12, 2014 Yeah of course all the standard stuff and anything people can think of. Two things to note it doesn't show values in debugger, api doesn't allow this. The table will be the size you specify for example how many rows columns, I could make it auto calculate like ubots table but this would make it more bulky and the whole point is storing large data. Obviously reading from file you wouldn't need to specifySounds pretty cool Kev. Would love to give it a try!Dan Quote Link to post Share on other sites
Code Docta (Nick C.) 638 Posted April 12, 2014 Report Share Posted April 12, 2014 Dan, Why are you putting it all into a list or table first? INSERT/UPDATE straight to data base. Use the DB as your list or table. You can go on forever. Quote Link to post Share on other sites
Bot-Factory 602 Posted April 12, 2014 Author Report Share Posted April 12, 2014 Dan, Why are you putting it all into a list or table first? INSERT/UPDATE straight to data base. Use the DB as your list or table. You can go on forever. Yeah, that's what I'm doing now. But I don't want to save every single item directly to the SQLite database. That would be very slow for 5 Million entries.I'm grouping them together. I scrape 250 Items into a list. Write them into the DB with one Query, clear the list and then I continue. That's working fine so far. But I still look at other ways to optimize it. Dan Quote Link to post Share on other sites
blumi40 222 Posted April 12, 2014 Report Share Posted April 12, 2014 Dan have u try BEGIN TRANSACTION and COMIT on insert your data to SQLite.im not 100% sure but i thing SQLite is setting the INDEX OFF and rebuild it after COMIT so your INSERTS should be much faster. 1 Quote Link to post Share on other sites
Bot-Factory 602 Posted April 12, 2014 Author Report Share Posted April 12, 2014 Dan have u try BEGIN TRANSACTION and COMIT on insert your data to SQLite.im not 100% sure but i thing SQLite is setting the INDEX OFF and rebuild it after COMIT so your INSERTS should be much faster.Interesting. Thanks a lot. Will definitely check it out.Dan Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.