This document discusses scaling Apache Spark applications and some of the unintended consequences that can arise. It covers Spark's core abstractions of RDDs and DataFrames for distributed data and computation. It explains how Spark's lazy evaluation model and use of deterministic partitioning can impact reusing data and operations like groupByKey. It also discusses challenges that can arise from Spark's support for arbitrary functions and working with non-JVM languages like Python.