fromMedium
2 weeks agoHow I Fixed a Critical Spark Production Performance Issue (and Cut Runtime by 70%)
"The job didn't fail. It just... never finished." That was the worst part. No errors.No stack traces.Just a Spark job running forever in production - blocking downstream pipelines, delaying reports, and waking up-on-call engineers at 2 AM. This is the story of how I diagnosed a real Spark performance issue in production and fixed it drastically, not by adding more machines - but by understanding Spark properly.

