You can translate the question and the replies:

JSON Parsing within string fields

Hi, We are recently getting stood up with Denodo for our organization and I am trying to find native functions that will work for JSON parsing. This is a bit different than the "flatten" or other abstractions that you can do from VDP, but rather manipulating strings of JSON which are already present within SQL server or (In my case) HIVE. Background: We have several ETL processes that bring in full JSON dict files and store each JSON file as a record in our system. When looking at it from something like HIVE, its just a STRING type field, but within the field its all JSON. Hive and Presto and Postgres all have native JSON functions such as "Get_JSON_Object", for which you just pass the field in question, the key and the separator. The function then returns the value of that Key and transforms they Key into a column. An example would look like this: SELECT distinct get_json_object(TABLENAME.properties,'$.properties.actionid') actionid, get_json_object(TABLENAME.properties,'$.properties.hashedmdn') hashedmdn, get_json_object(TABLENAME.properties,'$.properties.hardwareid') hardwareid, get_json_object(TABLENAME.properties,'$.properties.deviceostype') deviceostype, get_json_object(TABLENAME.properties,'$.properties.deviceosversion') deviceosversion, get_json_object(TABLENAME.properties,'$.properties.appversion') appversion, get_json_object(TABLENAME.properties,'$.properties.carriername') carriername, "time" as eventts, CAST("time" AS date) as eventdate FROM TABLENAME So from the above, we can keep all the JSON in one single field and then parse out the events we need, on the fly. Question: Can you do such a thing with Denodo (looking for the function name, if possible) OR do we need to prep the data into a view within HIVE, using this above code, before introspecting with Denodo? Thanks very much!!
user
23-02-2016 13:33:55 -0500

3 Answers

Hi, The best option is to create a JSON data source with the “From Variable” option (see more information in section “JSON Sources” of the Administration Guide). Then make a join between the original base view (from the JDBC source) and this new view (from JSON source) using the field with the JSON data from the first view as a parameter of the second view. Denodo VDP also allows (for JDBC/ODBC sources) to create a base view from a query. If you need to create a base view by querying the source directly you can do it by clicking the “Create base view from query” button (“Creating JDBC/ODBC Base Views from SQL Queries” in the Administration Guide). Finally, although Denodo VDP does not provide a set of functions to process JSON, it allows you to create your custom functions extending the set of functions available in Virtual DataPort. You can find more information in the section “Developing custom functions” in the Developer Guide and in the section “Importing Extensions” of the Administration Guide. Hope this helps!
Denodo Team
24-02-2016 06:05:30 -0500
Thanks very much for the info! Very helpful. I read the instructions in the manual and followed your directions above, but the second I execute the new view, it runs over my 16GB memory limit and fails. I don't think that could be, from that data set, so I think there is some cartesian join happening or something else. I just want to be sure this solution will work, so I am going to outline what steps I took. The data source (The JDBC one) is a table coming from HIVE. There are two columns. They are: "EventDate" (Timestamp) and "Properties" (JSON string). There are about 10 million records in this table. They represent events that happened to someones device, signified by the key/value in each json string. So each record is unique to that particular type of event, for that unique person. Now, I am not trying to pull down all 10 million. I was doing a limit 5 when I ran out of memory. Steps: 1. In VDP I already have my connection to HIVE via JDBC and the table, which contains the two fields (EventDate, Properties) works great and I can query that table from a SQL tool without any problems. 2. I created a new Json data source called "DS_JsonMixpanel" which had the data route of "From Variable" and a variable name of "variable_with_data". 3. From this I created a base view of the Json DS. I think this might be where I went wrong. Fro the "Variable_With_Data" variable I tried two different options. (1) a single json string sample from one record in my base table and (2) just the syntax of {"" : "" }. Both successfully saved. 4. Once I had the base view created I made a join between the original base view of the JSON source, with 2 fields and the new base view with the variable. I joined the "Properties" column from the original base view (which contains the json strings) to the "Variable_With_Data" field on the new base view and saved it. Unfortunately this approach is giving me that memory error, even when doing a limit 1. Any suggestions would be GREATLY appreciated!! Thanks
user
24-02-2016 10:21:14 -0500
Hi! Your steps are apparently correct. At step 3 the option 1 is the correct one (use a single JSON string sample from one record in the base table). The first time you create the base view from the JSON data source you must input a valid JSON string that will be use as a template to create the fields for that base view. When you make the join Denodo VDP matches the strings coming from the JDBC data source into the schema created from the example you have passed in the step 3. However if you query is falling there are several things you can do in order to find out what is going wrong. First you can inspect the tree view and the query plan and ensure that the configuration is correct. You can also check the execution trace after querying the view to verify how the join is being executed. You must take into account that the join needs to load the whole JSON string in memory even if the view only shows the fields you have selected from it. If that string is quite big then maybe this is not the best approach. Second you can review the log at “<DENODO_HOME>/logs/vdp” you may find extra information about the error. Finally be sure that the check box “Stop query when the limit is reached” has been selected, otherwise if only the check box “Limit rows” is selected the results will not show up but the query does not stop when the limit of results is reached (and consumes the memory likewise). Hope this helps!
Denodo Team
24-02-2016 11:18:16 -0500
You must sign in to add an answer. If you do not have an account, you can register here