If I look at the Imapala Shell or Hue, I can write complicated enough IMPALA update statements for KUDU. E.g. update with sub-select and what not. Fine.
Looking at the old JDBC connection methods for, say, mySQL via SPARK / SCALA, there is not a lot of possibility to do a complicated update via such a connection, and that is understandable. However, with KUDU, I think the situation changes.
Looking at the documentation on KUDU - Apache KUDU - Developing Applications with Apache KUDU, the follwoing questions:
In SPARK KUDU Native Mode DML seems tied to a Data Frame approach with INSERT and UPSERT. What if I just want to write a free-format SQL DML statement like an UPDATE? I see that we can use Spark SQL to INSERT (treated as UPSERT by default) into a Kudu table. E.g.
sqlContext.sql(s"INSERT INTO TABLE $kuduTableName SELECT * FROM source_table")
What a bad documentation in this regard.
DML insert, update, ... possible via the "approach" below, some examples:
stmt.execute("update KUDU_1 set v = 'same value' where k in ('1', '4') ;")
stmt.execute("insert into KUDU_1 select concat(k, 'ABCDEF'), 'MASS INSERT' from KUDU_1 ;")
The only thing if using the corresponding stmt.executequery a Java resultset is returned which differs to the more standard approach of reading from JDBC sources and persisting the results. A little surprise here for me. Maybe 2 approaches needed, one for more regular selects and one work DML non-select. Not sure if that can be all in the same programme module. For another time. Yes it can.