Apache Spark Core—Deep Dive—Proper Optimization Daniel Tomes Databricks

Optimizing spark jobs through a true understanding of spark core. Learn: What is a partition? What is the difference between read/shuffle/write partitions? How to increase parallelism and decrease output files? Where does shuffle data go between stages? What is the “right“ size for your spark partitions and files? Why does a job slow down with only a few tasks left and never finish? Why doesn’t adding nodes decrease my compute time? About: Databricks provides a unified data analytics platform, powered by A
Back to Top