This document discusses Hadoop design and k-means clustering. It outlines Hadoop's fault tolerance through task tracking and task replication. It describes Hadoop's data flow including input splitting, mapping and reducing. It also discusses optimizations like combiners. Finally it explains the k-means clustering algorithm and different approaches to implementing it in Hadoop including iterative MapReduce and partitioning large numbers of clusters.