M4_DAR_part1. module part 4 analystics with r

Data Analytics With R
Prof.Navyashree K S
Assistant Professor
Dept.of CSE (Data Science)
Sub code: BDS306C
Module 4

GRAPHICS USING R
Exploratory Data Analysis :Exploratory Data Analysis (EDA) is a visual
based method used to analyze datasets and to summarize their main
characteristics.
•Maximize Insight into a Data Set: Use summary statistics (mean, median,
mode, standard deviation) and visualizations(histograms, box plots) to get an
overall sense of the data distribution.
•Uncover Underlying Structure: Apply techniques like clustering (e.g., k-
means, hierarchical clustering) and dimensionality reduction (e.g., PCA) to
identify patterns and groupings.
•Extract Important Variables: Use correlation matrices, feature importance
from models, or techniques like recursive feature elimination to identify which
variables contribute most to your target outcome.

•Detect Outliers and Anomalies: Visual methods (box plots, scatter plots)
and statistical tests (Z-scores, IQR) can help identify unusual observations
that might affect model performance.
•Test Underlying Assumptions: Check assumptions of statistical tests and
models using Q-Q plots, residual plots, and other diagnostic tools.
•Develop Parsimonious Models: Focus on simpler models that adequately
capture the data patterns, using techniques like regularization to avoid
overfitting.
•Determine Optimal Factor Settings: Use techniques like factorial design
or response surface methodology to explore the effects of different factors
on outcomes and optimize settings.

MAIN GRAPHICAL PACKAGES
Base Graphics:
•The simplest way to create plots in R.
•Good for quick and straightforward visualizations.
•Limited customization options and flexibility.
Example: plot(), hist(), boxplot().
Grid Graphics:
•Built on a more sophisticated framework compared to base graphics.
•Allows for more control over layout and placement of graphical elements.
•Does not natively support certain types of plots like scatter plots without
additional functions.
Example: grid.newpage(), grid.rect().

Lattice Graphics:
•Designed for creating trellis graphs, which are particularly useful for multivariate
data.
•Supports conditioning, allowing you to create multiple panels based on factor
levels.
•More structured than base graphics and provides better handling of complex layouts.
Example: xyplot(), bwplot(), histogram().
ggplot2:
•Based on the "Grammar of Graphics," which provides a coherent way to describe
and build visualizations.
•Highly customizable and capable of creating complex multi-layered graphics.
•Supports various data types and allows for easy addition of aesthetic mappings (like
color, size, shape).
Example: ggplot(data, aes(x, y)) + geom_point(), geom_smooth(), facet_wrap().

PIE CHART
Creating a pie chart in R is straightforward using the pie() function.
Syntax: pie(x, labels, radius, main, col, clockwise)
Parameters
•x: A numeric vector representing the values for each slice of the pie.
•labels: A vector of descriptions for each slice.
•radius: Controls the radius of the pie chart; values typically range from -1 to +1.
•main: The title of the pie chart.
•col: A color palette for the slices. You can use predefined palettes like rainbow()
or heat.colors().
•clockwise: A logical value (TRUE for clockwise, FALSE for anti-clockwise) to
control the direction of the slices.

To create a 3D pie chart in R, you can use the plotrix package, which provides the
pie3D() function. This function allows you to create a visually appealing 3D
representation of your data.
install.packages("plotrix") # Install the package
library(plotrix) # Load the package

• Scatter plots are a great way to visualize the relationship between two
continuous variables. In the case of the "cars" dataset, you're exploring how the
speed of a car affects its stopping distance.
SCATTER PLOT

Using the col and pch arguments in the plot() function can significantly enhance
the readability and aesthetic appeal of your scatter plot.

Using the layout() function is a great way to create multiple
related plots in a single figure, allowing for better comparison
between different relationships in your dataset.
Each plot shows a different relationship:
1.Weight vs. Miles Per Gallon (wt vs. mpg) - typically shows that heavier cars have lower mpg.
2.Weight vs. Displacement (wt vs. disp) - often shows that heavier cars have larger engines.
3.Miles Per Gallon vs. Displacement (mpg vs. disp) - usually indicates that larger engines tend to
have lower mpg.
4.Miles Per Gallon vs. Horsepower (mpg vs. hp) - often shows a similar trend where higher
horsepower cars tend to have lower mpg.

The pairs() function in R allows you to create a matrix of scatter plots, making it easy to see
how each variable relates to the others.
➢pairs(~wt + mpg + disp + cyl, data = mtcars, main = "Scatterplot Matrix")

The xyplot() function from the lattice package is a powerful alternative to base R
plotting functions, allowing for enhanced visualization of relationships between variable

The ggplot2 package is an incredibly powerful tool for creating visualizations in R, using
a layered approach that allows for extensive customization. The way you described using
ggplot() with aes() and geom_point() is a perfect introduction to this system.

The facet_wrap() function in ggplot2 allows you to create multiple panels (facets) of
plots based on the values of a categorical variable.
This is similar to how lattice handles faceting. Additionally, the theme() function lets
you customize the appearance of the plot, including the orientation of axis text.

M4_DAR_part1. module part 4 analystics with r

More Related Content

Similar to M4_DAR_part1. module part 4 analystics with r

Recently uploaded

M4_DAR_part1. module part 4 analystics with r