Hi,
I'm trying to run a query on Denodo from AWS Glue. I connect to Denodo using denodo-vdp-jdbcdriver.jar. I'm using Spark 2.4.3 on Glue and my query looks something like this.
Query: sql = select <columns> from <interface> where col1 = 'filter1';
In the spark.read.options, I pass the above query and not the interface or the base view itself. I read Denodo dialect needs to be registered but I don't know if it is necessary for python.
My spark connection looks something like this -
df = spark \
.read \
.format("jdbc") \
.option("url",ddl_endpoint) \
.option("driver","com.denodo.vdp.jdbc.Driver") \
.option("query",sql) \
.option("user", source_dbparams['user']) \
.option("password",source_dbparams['password']) \
.load()
I get an error mentioned below. Any suggestions?
Lost task 0.0 in stage 3.0 (TID 2) (10.232.138.172 executor 7): org.apache.spark.SparkException: Task failed while writing rows.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:296)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:210)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.sql.SQLException: unexpected exception; nested exception is:
**com.denodo.internal.o.a.r.netty.KeepAliveException: Channel time out exception. Channel not available at least since 2023-07-19T09:33:46.293Z**
at com.denodo.vdb.jdbcdriver.dao.DAOVDBProxy.generateSQLException(DAOVDBProxy.java:496)
at com.denodo.vdb.jdbcdriver.VDBJDBCAsyncResultSet.next(VDBJDBCAsyncResultSet.java:1801)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:348)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:334)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:454)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:277)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:286)
... 9 more
I read in one of the community questions that I can set KeepAliveTimeout. I see it's not mentioned in the documentation as well. Can I set it somehow? Will that solve the problem?