Regarding Oracle's 1000 element limit in IN clause

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Regarding Oracle's 1000 element limit in IN clause

Jorge Gonzalez
Hi all,

first of all, I think this question is probably offtopic in this list since it's not DBIx::Class specific, but since I do all my DB development with DBIx::Class I'd like to ask here first. The question is probably of interest for others anyway. This is it:

I have read-only access to an Oracle schema. The user is limited, it can't create temporary tables, views, etc. Just access to some tables.

I have a big table BIGTABLE in this schema (~millions of rows) which I want to filter by values in a specific field. The number of values to filter is also big (~40.000), and can't be obtained from other tables, i.e. I take them out of another data source which is not the Oracle schema.

For this I tried the following query: SELECT * FROM BIGTABLE WHERE FIELD IN (...~40.000 values...). Oracle, though, has a 1000 element limit for IN clauses, so I have to break the query in some 40 smaller queries and then process the results afterwards (I call this the "slice technique" since I break the list in slices with splice :-) Appart from having to do N/1000 queries insted of 1 (and now N is 40.000 but it will surely grow, so more queries), I can't offload data processing to the SQL server (which of course would be good) and my app has to process the data itself.

My question is: has anyone managed to get past this limitation in any way?

One approach could be to load some temporary table with the filter data and then join BIGTABLE with the temp table, but as I said, my Oracle user is very limited and can't create temporary tables, much less load them with data.

I'd very much appreciate any idea for solving this. And again, accept my apologies if you feel this question is very offtopic.

Thanks in advance.
Regards
J.

--

Jorge González Villalonga
Director Técnico


DAIKON Integración y Desarrollo S.L.
Telf: (+34) 91 188 08 28
Fax: (+34) 91 632 65 42
www.daikon.es

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Oracle's 1000 element limit in IN clause

Martin J. Evans
On 22/10/2011 11:36, Jorge Gonzalez wrote:

> Hi all,
>
> first of all, I think this question is probably offtopic in this list
> since it's not DBIx::Class specific, but since I do all my DB
> development with DBIx::Class I'd like to ask here first. The question
> is probably of interest for others anyway. This is it:
>
> I have read-only access to an Oracle schema. The user is limited, it
> can't create temporary tables, views, etc. Just access to some tables.
>
> I have a big table BIGTABLE in this schema (~millions of rows) which I
> want to filter by values in a specific field. The number of values to
> filter is also big (~40.000), and can't be obtained from other tables,
> i.e. I take them out of another data source which is not the Oracle
> schema.
>
Obviously if you could pick the values from somewhere else in the
database you could do

select * from table where column in (select column from another_table)

but it sounds like you cannot do this.

> For this I tried the following query: SELECT * FROM BIGTABLE WHERE
> FIELD IN (...~40.000 values...). Oracle, though, has a 1000 element
> limit for IN clauses, so I have to break the query in some 40 smaller
> queries and then process the results afterwards (I call this the
> "slice technique" since I break the list in slices with splice :-)
> Appart from having to do N/1000 queries insted of 1 (and now N is
> 40.000 but it will surely grow, so more queries), I can't offload data
> processing to the SQL server (which of course would be good) and my
> app has to process the data itself.
>
It would be a long piece of SQL but you could get all the results in one
piece of SQL with:

select * from table where column in (1000 values) or column in (another
1000 values) ...

> My question is: has anyone managed to get past this limitation in any way?
>
> One approach could be to load some temporary table with the filter
> data and then join BIGTABLE with the temp table, but as I said, my
> Oracle user is very limited and can't create temporary tables, much
> less load them with data.
>
Which is probably what I would have done - shame.

> I'd very much appreciate any idea for solving this. And again, accept
> my apologies if you feel this question is very offtopic.
>
> Thanks in advance.
> Regards
> J.
Martin

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Oracle's 1000 element limit in IN clause

Oleg Kostyuk-2
In reply to this post by Jorge Gonzalez
Hi Jorge and all,

As for me, obvious way to solve this - it's use temporary tables. But you can't. Why? I'm too think that answer to this question is out of scope of this list. But this is main thing, as for me.

I think it's not normal, when developer want - and not just want, but have to - have to use temporary tables, and can't. Why this would be normal? You just trying to use some tool to solve your task, and have restrictions to use it in full. This is like when you need to use hammer, but you have restriction to not touch it - then how you can use it at all? Or like you need to write script in Perl, but can't use hashes, arrays and ref's - great conditions, isn't?

Obviously, you can't create temporary tables because you have restrictions from your DBAs. And obviously, they made such restrictions to acheive some goal. Most probably, they trying to save CPU and/or RAM on db server. Well, that's great goal. But if they so smart, then may be will be better to power off db server? - This will save lot of CPU and RAM, obviously. Or even better - then let them to solve your task, ok?

Try to understand: what you trying to do it's like crooked nail. You can't use it forever and keep thinking that everything is fine. Or, if it's fine for you, then obviously you are handicapped person. And if it's so, then nobody can't ask you to do something that can do any non-handicapped person.

I'm sorry if I was too direct and may be looks careless in terms. I don't wanted to offend anybody, and this is only because my level of English knowledge.

Hope you can re-think your situation and solve this puzzle successfully.
Good luck!



2011/10/22 Jorge Gonzalez <[hidden email]>
Hi all,

first of all, I think this question is probably offtopic in this list since it's not DBIx::Class specific, but since I do all my DB development with DBIx::Class I'd like to ask here first. The question is probably of interest for others anyway. This is it:

I have read-only access to an Oracle schema. The user is limited, it can't create temporary tables, views, etc. Just access to some tables.

I have a big table BIGTABLE in this schema (~millions of rows) which I want to filter by values in a specific field. The number of values to filter is also big (~40.000), and can't be obtained from other tables, i.e. I take them out of another data source which is not the Oracle schema.

For this I tried the following query: SELECT * FROM BIGTABLE WHERE FIELD IN (...~40.000 values...). Oracle, though, has a 1000 element limit for IN clauses, so I have to break the query in some 40 smaller queries and then process the results afterwards (I call this the "slice technique" since I break the list in slices with splice :-) Appart from having to do N/1000 queries insted of 1 (and now N is 40.000 but it will surely grow, so more queries), I can't offload data processing to the SQL server (which of course would be good) and my app has to process the data itself.

My question is: has anyone managed to get past this limitation in any way?

One approach could be to load some temporary table with the filter data and then join BIGTABLE with the temp table, but as I said, my Oracle user is very limited and can't create temporary tables, much less load them with data.

I'd very much appreciate any idea for solving this. And again, accept my apologies if you feel this question is very offtopic.

Thanks in advance.
Regards
J.

--

Jorge González Villalonga
Director Técnico


DAIKON Integración y Desarrollo S.L.
Telf: <a href="tel:%28%2B34%29%2091%20188%2008%2028" value="+34911880828" target="_blank">(+34) 91 188 08 28
Fax: <a href="tel:%28%2B34%29%2091%20632%2065%2042" value="+34916326542" target="_blank">(+34) 91 632 65 42
www.daikon.es

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...



--
Sincerely yours,
Oleg Kostyuk (CUB-UANIC)

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Oracle's 1000 element limit in IN clause

Darren Duncan
In reply to this post by Jorge Gonzalez
Jorge Gonzalez wrote:
> One approach could be to load some temporary table with the filter data
> and then join BIGTABLE with the temp table, but as I said, my Oracle
> user is very limited and can't create temporary tables, much less load
> them with data.

The best solution is to talk to the DBAs and get privileges to create/populate
etc temporary tables (and not permanent tables).  In fact, I would argue that
anyone with SELECT privileges should have temporary table privileges too,
because temporary tables are just working memory for you and don't affect anyone
else, no more than say using subqueries or other SELECT that involve collecting
a mass of temporary data.  There's no reason they shouldn't be able to grant you
this. -- Darren Duncan

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Oracle's 1000 element limit in IN clause

Jorge Gonzalez
In reply to this post by Oleg Kostyuk-2
Hi all,

Obviating the uncheerful response (I'm not a handicapped person; my oracle username is a handicapped one), I'll explain the situation so that everybody can understand why I can't create temporary tables:

1. My customer is a SMALL company (=little negotiation power).
2. They have and agreement with a HUGE company (=big pressure power), which offers to SMALL company a detailed access to their data warehouse. This is the Oracle access I'm using. It's a readonly user which can't do anything on the DB except query _some_ tables.
3. The SMALL company wants to cross information from their systems (which we manage and to which we have full access) with the HUGE company's data warehouse.
4. The HUGE company has a stated policy regarding access to their data warehouse, which forbids creating temporary tables.
5. The HUGE company can't change their policy easily without breaking their internal quality audits. And they will NOT change it for the SMALL company. And ther's NO way I can challenge those quality audits, since I'm an external person for them.
6. As a corollary to the former points, it's pretty clear that the DB permissions are completely out of my control and my customer's. They fully depend on a third party which won't allow us to do it. Full stop.

So:

A. I can't create temporary tables
B. I can't ask the DBAs to allow me to do it (well I can, but they'll ignore me)
C. I don't have the power to negotiate a change in the HUGE company's policy.

Regarding the comment "I think it's not normal, when developer want - and not just want, but have to - have to use temporary tables, and can't. Why this would be normal?" - Well, the concept of "normality" is pretty dependent on everyone's experience. What is not normal for you is just one of my daily situations. Sometimes you can bend things so that you can work better, and some times you can't and it's you who has to bend to the circumstances to get the work done.

I apologize if my language seems a bit harsh.
Regards
J.

El 22/10/11 13:49, Oleg Kostyuk escribió:
Hi Jorge and all,

As for me, obvious way to solve this - it's use temporary tables. But you can't. Why? I'm too think that answer to this question is out of scope of this list. But this is main thing, as for me.

I think it's not normal, when developer want - and not just want, but have to - have to use temporary tables, and can't. Why this would be normal? You just trying to use some tool to solve your task, and have restrictions to use it in full. This is like when you need to use hammer, but you have restriction to not touch it - then how you can use it at all? Or like you need to write script in Perl, but can't use hashes, arrays and ref's - great conditions, isn't?

Obviously, you can't create temporary tables because you have restrictions from your DBAs. And obviously, they made such restrictions to acheive some goal. Most probably, they trying to save CPU and/or RAM on db server. Well, that's great goal. But if they so smart, then may be will be better to power off db server? - This will save lot of CPU and RAM, obviously. Or even better - then let them to solve your task, ok?

Try to understand: what you trying to do it's like crooked nail. You can't use it forever and keep thinking that everything is fine. Or, if it's fine for you, then obviously you are handicapped person. And if it's so, then nobody can't ask you to do something that can do any non-handicapped person.

I'm sorry if I was too direct and may be looks careless in terms. I don't wanted to offend anybody, and this is only because my level of English knowledge.

Hope you can re-think your situation and solve this puzzle successfully.
Good luck!



2011/10/22 Jorge Gonzalez <[hidden email]>
Hi all,

first of all, I think this question is probably offtopic in this list since it's not DBIx::Class specific, but since I do all my DB development with DBIx::Class I'd like to ask here first. The question is probably of interest for others anyway. This is it:

I have read-only access to an Oracle schema. The user is limited, it can't create temporary tables, views, etc. Just access to some tables.

I have a big table BIGTABLE in this schema (~millions of rows) which I want to filter by values in a specific field. The number of values to filter is also big (~40.000), and can't be obtained from other tables, i.e. I take them out of another data source which is not the Oracle schema.

For this I tried the following query: SELECT * FROM BIGTABLE WHERE FIELD IN (...~40.000 values...). Oracle, though, has a 1000 element limit for IN clauses, so I have to break the query in some 40 smaller queries and then process the results afterwards (I call this the "slice technique" since I break the list in slices with splice :-) Appart from having to do N/1000 queries insted of 1 (and now N is 40.000 but it will surely grow, so more queries), I can't offload data processing to the SQL server (which of course would be good) and my app has to process the data itself.

My question is: has anyone managed to get past this limitation in any way?

One approach could be to load some temporary table with the filter data and then join BIGTABLE with the temp table, but as I said, my Oracle user is very limited and can't create temporary tables, much less load them with data.

I'd very much appreciate any idea for solving this. And again, accept my apologies if you feel this question is very offtopic.

Thanks in advance.
Regards
J.

--

Jorge González Villalonga
Director Técnico


DAIKON Integración y Desarrollo S.L.
Telf: <a moz-do-not-send="true" href="tel:%28%2B34%29%2091%20188%2008%2028" value="+34911880828" target="_blank">(+34) 91 188 08 28
Fax: <a moz-do-not-send="true" href="tel:%28%2B34%29%2091%20632%2065%2042" value="+34916326542" target="_blank">(+34) 91 632 65 42
www.daikon.es

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...



--
Sincerely yours,
Oleg Kostyuk (CUB-UANIC)


_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Oracle's 1000 element limit in IN clause

Jorge Gonzalez
In reply to this post by Martin J. Evans
El 22/10/11 13:17, Martin J. Evans escribió:

>
>> For this I tried the following query: SELECT * FROM BIGTABLE WHERE
>> FIELD IN (...~40.000 values...). Oracle, though, has a 1000 element
>> limit for IN clauses, so I have to break the query in some 40 smaller
>> queries and then process the results afterwards (I call this the
>> "slice technique" since I break the list in slices with splice :-)
>> Appart from having to do N/1000 queries insted of 1 (and now N is
>> 40.000 but it will surely grow, so more queries), I can't offload
>> data processing to the SQL server (which of course would be good) and
>> my app has to process the data itself.
>>
> It would be a long piece of SQL but you could get all the results in
> one piece of SQL with:
>
> select * from table where column in (1000 values) or column in
> (another 1000 values) ...
>

Thanks for this useful response, I'll try it. It comes to mind that
there was a limit on SQL query length, but although there was a 64K
limit in Oracle 9, there seems to be no limit on Oracle 10 and onwards
(checked in OTN doc). And my Oracle DB is 10g...

I'll keep you informed :-)
Cheers and thanks again
J.



_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Oracle's 1000 element limit in IN clause

Marc Logghe
In reply to this post by Jorge Gonzalez
I often use sqlite for these kind of purposes. Works like a charm.

On Mon, Oct 24, 2011 at 10:48 AM, Matija Grabnar <[hidden email]> wrote:
You have another possibility: create the temporary tables in another database. This database can be on your OWN server, so the HUGE company doesn't even have to know about it.

If you were to download the full resultset and do the searching in Perl you would in effect be doing the
same thing, except your database would be in your RAM, and you would be using Perl instead of SQL
to search for it. For large datasets (which this seems to be) a database on the local server is more efficient than doing it all in RAM.


_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Oracle's 1000 element limit in IN clause

Jorge Gonzalez
In reply to this post by Jorge Gonzalez
El 24/10/11 10:48, Matija Grabnar escribió:
> You have another possibility: create the temporary tables in another
> database. This database can be on your OWN server, so the HUGE company
> doesn't even have to know about it.
>

Yes but I need to cross information from the HUGE company database, so I
need to join the tables. I want to filter a table with millions of rows
(HUGE db) with a local list of values for a field (SMALL db). For this,
if I want to use temp tables I have two options:

A. Create a temp table in HUGE database with the contents of SMALL
database values. I'm not allowed to do this, as I said in my former mails.

B. Create a temp table in SMALL database with the contents of the HUGE
database I want to filter. Thi would mean transfering a copy of the
table with _millions_ of rows to local, just to discard it afterwards.
Seems not very reasonable.

If I want to join the tables they need to be in the same schema (or at
least the same DB server), so I can do only two things: move the SMALL
table where the HUGE one is (not allowed), or move the HUGE one where
the SMALL one is (not reasonable).

> If you were to download the full resultset and do the searching in
> Perl you would in effect be doing the
> same thing, except your database would be in your RAM, and you would
> be using Perl instead of SQL
> to search for it. For large datasets (which this seems to be) a
> database on the local server is more efficient than doing it all in RAM.

Question is, my server has enough RAM to slurp the resultset and then
search (which is what I'm doing now). If enough RAM is available, no
disk-based SQL server can beat that, provided that efficient search
algorithms are used (I'm using the resultset to create several RAM based
indexes - perl hashes - before doing any searching).

Thanks for the idea anyway.
Regards
J.


_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Oracle's 1000 element limit in IN clause

Peter Rabbitson-2
On Mon, Oct 24, 2011 at 11:55:51AM +0200, Jorge Gonzalez wrote:

> El 24/10/11 10:48, Matija Grabnar escribió:
>> You have another possibility: create the temporary tables in another  
>> database. This database can be on your OWN server, so the HUGE company  
>> doesn't even have to know about it.
>>
>
> Yes but I need to cross information from the HUGE company database, so I  
> need to join the tables. I want to filter a table with millions of rows  
> (HUGE db) with a local list of values for a field (SMALL db). For this,  
> if I want to use temp tables I have two options:
>
> A. Create a temp table in HUGE database with the contents of SMALL  
> database values. I'm not allowed to do this, as I said in my former
> mails.
>
> B. Create a temp table in SMALL database with the contents of the HUGE  
> database I want to filter. Thi would mean transfering a copy of the  
> table with _millions_ of rows to local, just to discard it afterwards.  
> Seems not very reasonable.
>
> If I want to join the tables they need to be in the same schema (or at  
> least the same DB server), so I can do only two things: move the SMALL  
> table where the HUGE one is (not allowed), or move the HUGE one where  
> the SMALL one is (not reasonable).
>
>> If you were to download the full resultset and do the searching in  
>> Perl you would in effect be doing the
>> same thing, except your database would be in your RAM, and you would  
>> be using Perl instead of SQL
>> to search for it. For large datasets (which this seems to be) a  
>> database on the local server is more efficient than doing it all in
>> RAM.
>
> Question is, my server has enough RAM to slurp the resultset and then  
> search (which is what I'm doing now). If enough RAM is available, no  
> disk-based SQL server can beat that, provided that efficient search  
> algorithms are used (I'm using the resultset to create several RAM based  
> indexes - perl hashes - before doing any searching).
>

Who said disk-based? my $dsn = 'dbi:SQLite::memory:' :)

Cheers

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Oracle's 1000 element limit in IN clause

Rob Kinyon
In reply to this post by Jorge Gonzalez
On Mon, Oct 24, 2011 at 07:46, Matija Grabnar <[hidden email]> wrote:

> On 10/24/2011 11:55 AM, Jorge Gonzalez wrote:
>>
>> B. Create a temp table in SMALL database with the contents of the HUGE
>> database I want to filter. Thi would mean transfering a copy of the table
>> with _millions_ of rows to local, just to discard it afterwards. Seems not
>> very reasonable.
>
> The reasonable solutions are the ones you aren't allowed to do. What is left
> is the unreasonable ones.
>>
>>  or move the HUGE one where the SMALL one is (not reasonable).
>
> Or cache PART of the HUGE one on your side. Which may or may not be
> reasonable, depending
> on exact details of your problem.

I think this suggestion makes a key point. If there is a relatively
small subset (relative to the total size of the warehouse, maybe even
up to 100GB), then maybe the right solution is to create a separate
service that does nothing but mirror this subset to your database.
Then, you can access those tables as if they were in your database,
without your application caring how they are populated.

The important design concept is to separate the service that maintains
the tables from the service(s) that *consumes* the tables.

Rob

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Oracle's 1000 element limit in IN clause

Dave Howorth
There's another possibility that might be usable, which is to transform
the query set. I don't think Jorge has said what the datatype of the
filter set is, but perhaps it is numeric or alphabetic.

If numeric, it may be possible to replace some of the 40,000 individual
value tests by range tests. Even if that doesn't produce the precise
resultset wanted, it can be supplemented by additional queries to
retrieve more rows or by additional filtering to remove false hits.

Similarly if the data type is alphabetic, it may be possible to combine
multiple filters using by pattern-matching queries.

Those techniques should reduce the length of the query sent to the server.


Of course the original suggestion of IN OR IN OR IN is much simpler to
implement, as is mirroring the source if that can be done without
performance, consistency or terms-of-business problems.

Cheers, Dave

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Oracle's 1000 element limit in IN clause

Jorge Gonzalez
In reply to this post by Jorge Gonzalez
El 24/10/11 13:46, Matija Grabnar escribió:

>> Question is, my server has enough RAM to slurp the resultset and then
>> search (which is what I'm doing now). If enough RAM is available, no
>> disk-based SQL server can beat that, provided that efficient search
>> algorithms are used (I'm using the resultset to create several RAM
>> based indexes - perl hashes - before doing any searching).
>
> You would be surprised at what disk-based SQL servers can do,
> particularly when compared to programs written in an interpreted
> language. If your RAM based searches ran fast enough for your needs, I
> don't think you'd be writing here. If they don't run fast enough for
> you, you can implement a quick benchmark to see how  a DISK-based SQL
> server can do.

I would not classify Perl as an interpreted language - at least not in
the traditional sense. My search function is the perl hash search
function which I suspect is highly optimized. And I maintain my words:
no disk-based SQL server is going to beat an in-memory search, given
that the data is correctly indexed in both systems. All other factors
equal (or similar), RAM access times and disk access times are different
by several orders of magnitude.

The queries run fast enough (for now), but they will be a problem
sometime in the future: instead of doing a fixed number of queries
regardless of the number of elements to filter, I have to do one for
each 1000 elements: instead of constant cost, I have linear cost. For
now it's ok, but soon there will be not 40.000 elements but 200.000 to
filter, and then I'll have to do 200 queries instead of 40. And what
about when I have to filter by 1000000 values?

In the long term, I suppose I'll end mirroring parts of the data
warehouse locally, syncing it by the night, as someone has already
suggested.

Thanks to all that have responded.
BR
J.


_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Oracle's 1000 element limit in IN clause

Alexander Hartmaier
Why don't you use the column IN (...) OR column IN (...) solution?
I'd like to see this as a patch for the Oracle SQLMaker.

Cheers, Alex

Am 2011-10-24 16:06, schrieb Jorge Gonzalez:

> El 24/10/11 13:46, Matija Grabnar escribió:
>>> Question is, my server has enough RAM to slurp the resultset and
>>> then search (which is what I'm doing now). If enough RAM is
>>> available, no disk-based SQL server can beat that, provided that
>>> efficient search algorithms are used (I'm using the resultset to
>>> create several RAM based indexes - perl hashes - before doing any
>>> searching).
>>
>> You would be surprised at what disk-based SQL servers can do,
>> particularly when compared to programs written in an interpreted
>> language. If your RAM based searches ran fast enough for your needs,
>> I don't think you'd be writing here. If they don't run fast enough
>> for you, you can implement a quick benchmark to see how  a DISK-based
>> SQL server can do.
>
> I would not classify Perl as an interpreted language - at least not in
> the traditional sense. My search function is the perl hash search
> function which I suspect is highly optimized. And I maintain my words:
> no disk-based SQL server is going to beat an in-memory search, given
> that the data is correctly indexed in both systems. All other factors
> equal (or similar), RAM access times and disk access times are
> different by several orders of magnitude.
>
> The queries run fast enough (for now), but they will be a problem
> sometime in the future: instead of doing a fixed number of queries
> regardless of the number of elements to filter, I have to do one for
> each 1000 elements: instead of constant cost, I have linear cost. For
> now it's ok, but soon there will be not 40.000 elements but 200.000 to
> filter, and then I'll have to do 200 queries instead of 40. And what
> about when I have to filter by 1000000 values?
>
> In the long term, I suppose I'll end mirroring parts of the data
> warehouse locally, syncing it by the night, as someone has already
> suggested.
>
> Thanks to all that have responded.
> BR
> J.
>
>
> _______________________________________________
> List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
> IRC: irc.perl.org#dbix-class
> SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
> Searchable Archive:
> http://www.grokbase.com/group/dbix-class@...

*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*
T-Systems Austria GesmbH   Rennweg 97-99, 1030 Wien
Handelsgericht Wien, FN 79340b
*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*
Notice: This e-mail contains information that is confidential and may be privileged.
If you are not the intended recipient, please notify the sender and then
delete this e-mail immediately.
*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Oracle's 1000 element limit in IN clause

Jorge Gonzalez

El 24/10/11 16:15, Alexander Hartmaier escribió:
> Why don't you use the column IN (...) OR column IN (...) solution?
> I'd like to see this as a patch for the Oracle SQLMaker.
>
> Cheers, Alex
>

I have it on my (urgent) TO-DO list. Promise I'll check back with the
results. Honest :-)

BR
J.


_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Oracle's 1000 element limit in IN clause

Ben Tilly
In reply to this post by Alexander Hartmaier
An important note.  While this may work, performance is likely to be a
problem.  In my experience when trying to match an IN clause, Oracle
goes through all of values in turn to test the condition.  If there
are a lot of values, this is slow  The 1000 value limit within Oracle
should be taken as a hint from Oracle that you're doing something
wrong.  They actively want you to use a temporary table with an index.

It is ugly and a lot more work, but performance should be
significantly better if you use a nested CASE statement that broke it
into sections.  Something like this:

  1 = CASE
        WHEN your_field < 'b'
          THEN
            CASE
              WHEN your_field IN ('aaa', 'aab', ..., 'aaz')
              THEN 1
              ELSE 0
            END
        WHEN your_field < 'c'
          THEN
            CASE
              WHEN your_field IN ('baa', 'bab', ..., 'baz')
              THEN 1
              ELSE 0
            END
        ...
      END

Of course performance would be much better with a temporary table, but
bureaucracy has blocked that...

On Mon, Oct 24, 2011 at 7:15 AM, Alexander Hartmaier
<[hidden email]> wrote:

> Why don't you use the column IN (...) OR column IN (...) solution?
> I'd like to see this as a patch for the Oracle SQLMaker.
>
> Cheers, Alex
>
> Am 2011-10-24 16:06, schrieb Jorge Gonzalez:
>>
>> El 24/10/11 13:46, Matija Grabnar escribió:
>>>>
>>>> Question is, my server has enough RAM to slurp the resultset and
>>>> then search (which is what I'm doing now). If enough RAM is
>>>> available, no disk-based SQL server can beat that, provided that
>>>> efficient search algorithms are used (I'm using the resultset to
>>>> create several RAM based indexes - perl hashes - before doing any
>>>> searching).
>>>
>>> You would be surprised at what disk-based SQL servers can do,
>>> particularly when compared to programs written in an interpreted
>>> language. If your RAM based searches ran fast enough for your needs,
>>> I don't think you'd be writing here. If they don't run fast enough
>>> for you, you can implement a quick benchmark to see how  a DISK-based
>>> SQL server can do.
>>
>> I would not classify Perl as an interpreted language - at least not in
>> the traditional sense. My search function is the perl hash search
>> function which I suspect is highly optimized. And I maintain my words:
>> no disk-based SQL server is going to beat an in-memory search, given
>> that the data is correctly indexed in both systems. All other factors
>> equal (or similar), RAM access times and disk access times are
>> different by several orders of magnitude.
>>
>> The queries run fast enough (for now), but they will be a problem
>> sometime in the future: instead of doing a fixed number of queries
>> regardless of the number of elements to filter, I have to do one for
>> each 1000 elements: instead of constant cost, I have linear cost. For
>> now it's ok, but soon there will be not 40.000 elements but 200.000 to
>> filter, and then I'll have to do 200 queries instead of 40. And what
>> about when I have to filter by 1000000 values?
>>
>> In the long term, I suppose I'll end mirroring parts of the data
>> warehouse locally, syncing it by the night, as someone has already
>> suggested.
>>
>> Thanks to all that have responded.
>> BR
>> J.
>>
>>
>> _______________________________________________
>> List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
>> IRC: irc.perl.org#dbix-class
>> SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
>> Searchable Archive:
>> http://www.grokbase.com/group/dbix-class@...
>
> *"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*
> T-Systems Austria GesmbH   Rennweg 97-99, 1030 Wien
> Handelsgericht Wien, FN 79340b
> *"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*
> Notice: This e-mail contains information that is confidential and may be
> privileged.
> If you are not the intended recipient, please notify the sender and then
> delete this e-mail immediately.
> *"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*
>
> _______________________________________________
> List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
> IRC: irc.perl.org#dbix-class
> SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
> Searchable Archive:
> http://www.grokbase.com/group/dbix-class@...
>

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...
Reply | Threaded
Open this post in threaded view
|

Re: Regarding Oracle's 1000 element limit in IN clause

Jorge Gonzalez
In reply to this post by Jorge Gonzalez
Hi all,

I tried the approach mentioned below. I have good news and bad news.

The good ones are that Oracle seems to accept it (server version is 10g). I generated a query with some 50 OR'ed IN clauses, with ~900 elements each, for a total of around 45000 elements. The SQL query (text format) weighted about 400kB. Big, but nothing serious, I think.

The bad news (for me) is that Oracle rejected the query with a ORA-04031 error, " unable to allocate 399448 bytes of shared memory", which tells me that probably the DB admins (those that won't allow me to use temp tables), don't allow me to create a 400kB query neither.

So for the short term, I'll have to stick with the 50 queries and local filtering. For the future, my customer has nearly approved a project to mirror the remote data warehouse locally so that we can do anything we want with it.

Thanks to all who replied and for the proposed ideas.

Best regards
J.

El 24/10/11 16:30, Jorge Gonzalez escribió:

El 24/10/11 16:15, Alexander Hartmaier escribió:
Why don't you use the column IN (...) OR column IN (...) solution?
I'd like to see this as a patch for the Oracle SQLMaker.

Cheers, Alex


I have it on my (urgent) TO-DO list. Promise I'll check back with the results. Honest :-)

BR
J.


_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/dbix-class@...