You can translate the question and the replies:

HIVE "limit" delegation

Hi there, I was told that Denodo server 5.5 doesn't support the delegation of "limit" in HIVE. Specifically we use AWS hive (version1.1) and anytime a user does a "Select * from Table limit 100;", it spins up a Map/Reduce job, which in turn does a "Select * from table". Then denodo limits the data based on what the user enters. The issue is that most of our tables have millions of records... so it may pull back 88 million records to then limit to 1. The person I chatted with suggested making a new selection view with a variable ... in essence it would make a view which limited to like 1000 or something. That is a good suggestion, but not feasible to mange with hundreds of tables coming from HIVE. Also, when we run really complex queries, I am afraid it will filter out some records based on that "limit" its doing at the base view level. Does anyone have any suggestions on how to make better use of HIVE? I would really like to make that our default as now we are using PRESTO and it seems to be more unstable with Denodo than HIVE (It doesn't delegate anything). Any help would be greatly appreciated!! Thanks
user
01-03-2016 18:51:39 -0500
code

7 Answers

Hi! You are correct that Denodo does not support automatically delegating ‘LIMIT’ to Hive. The only way to have the ‘LIMIT’ be delegated would be the ‘Create base view from query’ option. If I understand your situation correctly, you want to be able to have the limit be delegated when running queries on the base view but want the ability to not have the number of rows limited when running queries on views that are built on top of the base view. Assuming you are using an interpolation variable in your base view, this functionality is easily achievable. When executing your base view, you can specify at runtime whichever limit you wish. Now, let’s say you create a derived view on top of your base view, and you do not want the limit to be utilized when executing this view. When creating this derived view, in the ‘Where Conditions’ tab set the value of the interpolation variable to be larger than the total number of tuples in the original base view. The ‘LIMIT’ will still be delegated, but its value will be too high to filter out any tuples. This will allow you to use the limit to filter out tuples when querying the base view, but not filter out tuples when querying a derived view built on top of that base view. If you decide to use this option, you’ll likely want to enable ‘Delegate SQL Sentence as Subquery.’ By doing this, VDP will be able to delegate more queries over the base view to the database. This option can be found in the Wrapper Source Configuration window of the advanced tab for your view. You can find more information about creating a base view from SQL query in the ‘Creating JDBC/ODBC Base Views from SQL Queries’ section of the VDP Admin Guide One thing to note is that ‘LIMIT’ is not the only option for restricting the number of tuples. Alternatively, you may want to simply utilize where conditions when executing your query rather than applying a limit. For example, if your view has an ‘id’ column, you could query it with the condition ‘WHERE id < 1000’ or something similar. This provides another means of restricting the number of tuples returned by your base view and could help to optimize your performance. Hope that helps!
Denodo Team
02-03-2016 21:00:11 -0500
code
Thanks for the answer... I do have it working correctly as you described above. One question... How would I limit "view" access to base views and only show the users the derived views? Meaning that I don't want to expose the base view that has the parameter because the user won't know to query it correctly (because they don't know what parameter I put in)... but the derived view works beautifully, so that would be the only thing I would want to expose to them. When I look at restricting access, I can only see it from a global VDB level.. can I get down to a table level for removing "read" access? Thanks!
user
14-03-2016 12:21:52 -0400
Hi! So I understand your question to mean that you want to restrict read access on a particular view. When assigning privileges for a particular user or role, click on the “Advanced privileges” dialog (the rightmost column in the window). This dialog lists the views and procedures of the database and for each one, you can select the appropriate check box to give the user privileges to execute read, write, insert, update and/or delete queries over a view/stored procedure. I also want to take a second to point out a function that you may find useful with the creation of this baseview. The interpolation function ExecuteIfIsNotNull() would provide you a way to only have the ‘LIMIT’ appear in your delegated sql sentence if you provide a value for your parameter. For example, using: SELECT * FROM table ^ExecuteIfIsNotNull("LIMIT ",@LIMITEXPRESSION,"") would execute the query with the LIMIT if a value for the parameter is provided, and would execute the query without the limit clause entirely (SELECT * FROM table) if the value for the parameter is left null. If you choose to utilize this, you’ll want to make sure the parameter is set to optional in the Search Methods window of your view. You can find more information about this function in the ‘Using the WHEREEXPRESSION Variable’ section of the VDP Admin Guide. Hope this helps!
Denodo Team
14-03-2016 19:57:54 -0400
code
Hey Denodo Team. Great suggestion!!! Thanks so much for the advice... I think this might solve our problem, if I could get it to work. :) I think I might be doing something wrong... let me outline what I did. 1. Double clicked on the data source in qustion (in this case AWS HIVE). 2. Clicked "Create base view from query" 3. Entered base table name and the following select statement: SELECT * FROM l3.pss_contacts ^ExecuteIfIsNotNull("LIMIT",@limitrows," ") 4. Click OK. It did work and asked me to put in the variable, for which I put '1000'. (I did not click "SQL Fragment". 5. Click OK. Everything got created and seemed to be OK. 6. I then edited the base view and chose OPT for the search method of "limitrows" The issue I face is that when I execute the base view, with a "limitrows = '1000'" Its not delegating to the data source. When I go look at the HIVE query monitor, the SQL statement being executed against it shows "Select * from l3.pss_contacts". Am I doing anything incorrect on this? Thanks!
user
15-03-2016 09:25:29 -0400
I think I just found it... its strange... so I changed it to: SELECT * FROM l3.pss_contacts ^ExecuteIfIsNotNull("LIMIT ",@LIMITROWS,"") Which just has no spaces in the string... here is what it does differently now: 1. It does delegate LIMIT command. 2. But oddly enough Denodo changed all the field names to have a "tableName" + "FieldName" + "_0" after each field. It didn't do that before. So a field which used to be called "agent_name" now shows "pss_contactsagent_name_0". Not a huge issue as I can rename each field, but is that expected? I would like to not have to rename all fields, from hundreds of tables. :) Thanks!
user
15-03-2016 10:25:18 -0400
Hi, First, I think the reason your initial query didn’t work properly is that you didn’t leave a space between LIMIT and the parameter ("LIMIT" vs "LIMIT "). For example, your query: SELECT * FROM l3.pss_contacts ^ExecuteIfIsNotNull("LIMIT",@limitrows," ") would return: SELECT * FROM l3.pss_contacts LIMIT5 if you provided a value of 5 for your parameter. Your second query corrected this. As for your second question, you’ll experience this behavior with HIVE datasources when you utilize ‘SELECT *’ with your above ‘base view from query’ statement. If you modify the above query to specify the selected columns, you will not face this issue. Hope that helps
Denodo Team
15-03-2016 18:32:47 -0400
code
Brilliant.. I should have thought of that second one. :) Great suggestions. Thanks!
user
15-03-2016 18:46:25 -0400
You must sign in to add an answer. If you do not have an account, you can register here