Bot-Factory 602 Posted February 19, 2015 Report Share Posted February 19, 2015 Hi Guys, I have a SQL database with 2 tables. Table 1 = 5 Million URLsTable 2 = Account Data (7 Fields) Some of those URLs from Table 1 will be used by the accounts who are in Table2. It might be only 1 url or I might use 500 urls. I need a way to identify which URLs are used by what accounts. I thought about adding an additional table. Where I storeAccount in one field and the connected urls in another field. And then use a SQL statement like: SELECT l.id, l.value2.FROM [20090915_anti].t_left l3.WHERE l.value NOT IN4.(5.SELECT value6.FROM [20090915_anti].t_right r7.) Would you say that this is the most efficient (fastest) way of doing that? Thanks for your helpDan Quote Link to post Share on other sites
Pete_UK 9 Posted February 19, 2015 Report Share Posted February 19, 2015 I would add the account id field into the URL table and create a secondary index on that field. Then make the select statement using a left join. I'm presuming the account Id is a primary key in the Accounts table. Something like SELECT Table1.Id, Table2.urlFROM Table1LEFT JOIN Table2ON Table1.Id=Table2.Id; Quote Link to post Share on other sites
Pete_UK 9 Posted February 19, 2015 Report Share Posted February 19, 2015 that would select all records from account and the matching urls, if a url exists If you want only matching records then just an inner join is needed. SELECT Table1.Id, Table2.urlFROM Table1INNER JOIN Table2ON Table1.Id=Table2.Id; Quote Link to post Share on other sites
Bot-Factory 602 Posted February 19, 2015 Author Report Share Posted February 19, 2015 The goal is to filter out all the urls who are used at the moment. So you would add an additional id field into the url table?Wouldn't that mean it had to query a lot more stuff compared to when it's in an extra table?But hmmm I still run a query against the url table. So that's probably not a big difference. I thought that "not in" is faster than the join command? Or is that wrong? Quote Link to post Share on other sites
Pete_UK 9 Posted February 19, 2015 Report Share Posted February 19, 2015 not in can be faster but quite often you may get the same underlying plan. in my experience, adding the secondary index will make most tables that size fly. Quote Link to post Share on other sites
Bot-Factory 602 Posted February 19, 2015 Author Report Share Posted February 19, 2015 Awesome! Thanks a lot Pete! Will look into that. Dan Quote Link to post Share on other sites
meter 145 Posted February 19, 2015 Report Share Posted February 19, 2015 You want to have a user_id columns in URLs table. Each url will have an associated user_id. Then you can just do: SELECT * FROM urls WHERE user_id=1 This is the scheme I use for Termexplorer.com, and it's lightning fast across hundreds of millions of records. Like Pete said, even faster if you put a secondary index on user_id. Quote Link to post Share on other sites
Bot-Factory 602 Posted February 19, 2015 Author Report Share Posted February 19, 2015 You want to have a user_id columns in URLs table. Each url will have an associated user_id. Then you can just do: SELECT * FROM urls WHERE user_id=1 This is the scheme I use for Termexplorer.com, and it's lightning fast across hundreds of millions of records. Like Pete said, even faster if you put a secondary index on user_id.So I need a user_id in my accounts table as well. I should probably not use rowid for that right :-) Could I not just add a email column into the URLs table? The final query would look something like. I don't want urls who have an account assigned alreadyselect urls from profiles where xxx LIKE '%xxx%' and yyy like '%hhuhu%' and accounts = ''Then I need to update all of those urls and write the assigned account email address into a field. I also need to run:select urls from profiles where accounts = 'email@xxx.com' AND I have to reset them from time to time.UPDATE profiles SET accounts = '' WHERE url = 'www.google.com' I'm still a bit confused about what the fastes approach will be. And where I should add indexes. :-(The profiles table has 16 fields. And my LIKE queries could look into any of those and sometimes even 3-6 at the same time. Dan Quote Link to post Share on other sites
meter 145 Posted February 20, 2015 Report Share Posted February 20, 2015 Well users table should have a primary index (such as 'id' column, or rowid as you put it). It is entirely valid to use that as user_id in your urls tables. I think it would be bad to add email column to urls table. It really doesn't belong there Maybe you should describe your current db schema more, I'm afraid I had trouble following the last part of your post. -meter Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.