java/scala multithreading on spark?
I'm using scala spark to get data from hudi table to hive table. I'm already maximizing optimization on spark with worker nodes and cluster size. Table is missing a lot of features that could improve the speed, but it's managed by other team and many other teams are also accessing this table (Thus table side change cannot happen). What I've not tried is multithreading from scala.
Has any of you used java and spark, and applied java multithreading and saw noticeable improvement from data ETL duration?
Has any of you used java and spark, and applied java multithreading and saw noticeable improvement from data ETL duration?