Uploaded bysaberunnisaa

PPTX, PDF23 views

HARNESSING R: STATISTICAL TECHNIQUES FOR DATA-DRIVEN ENTREPRENEURSHIP

HARNESSING R: STATISTICAL TECHNIQUES FOR DATA-DRIVEN ENTREPRENEURSHIP- Advanced Graphics with ggplot2, Introduction to Statistical AnalysisTypes of Statistical Analysis, Descriptive StatisticsInferential statistics, Probability Distributions in R

Data & Analytics◦

HARNESSING R: STATISTICAL
TECHNIQUES FOR DATA-
DRIVEN
ENTREPRENEURSHIP
DR. SABERUNNISA. A
ASSISTANT PROFESSOR
THE MADURA
COLLEGE(AUTONOMOUS)
MADURAI

WHAT IS R?
Open-source statistical software: Free to use, developed under the GNU license.
Programming language & environment: Specifically designed for statistical
computing and graphics.
Widely used in Data Science, Machine Learning, and Statistics for:
Data manipulation & cleaning
Statistical modeling
Visualization (graphs, charts, plots)
Predictive analytics
Community-driven: Thousands of user-contributed packages on CRAN.

History and Background of R
Developed by Ross Ihaka and Robert Gentleman in 1993 at the University of
Auckland, New Zealand.
Based on the S programming language, originally developed at Bell Laboratories.
First official release in 1995; Version 1.0.0 was launched in 2000.
Supported by the R Foundation for Statistical Computing (established in 2003).
Today, R is globally recognized as one of the leading tools for statistical computing,
data science, and research.

Features of R
 Free and Open Source
 Available at no cost, licensed under
GNU.
 Huge Package Ecosystem (CRAN)
 19,000+ packages covering statistics,
ML, data visualization, bioinformatics,
etc.
 Cross-Platform Compatibility
 Runs smoothly on Windows, Mac,
and Linux systems.
 Strong Visualization Libraries
 Base R graphics and advanced tools
like ggplot2, lattice, plotly.
 Highly Extensible
 Users can develop custom functions
and packages.
 Active Community Support
 Global network of developers and
researchers.

Installing and Using R
Download from CRAN (Comprehensive R Archive Network)
◦ Official website: https://cran.r-project.org
◦ Choose installer for Windows, Mac, or Linux.
RStudio IDE (Integrated Development Environment)
◦ Provides a user-friendly interface for coding in R.
◦ Features: script editor, console, plots, package manager.

Interface Visuals

Installing Packages
tidyverse – a collection of packages for data wrangling & visualization.
install.packages("tidyverse")
(includes ggplot2, dplyr, readr, tidyr, tibble, stringr, forcats)
data.table – for super-fast data manipulation.
install.packages("data.table")

Installing Packages
readxl & openxlsx – to import/export Excel files.
install.packages("readxl")
install.packages("openxlsx")
haven – to read SPSS, SAS, and Stata files.
install.packages("haven")

Statistical Analysis Packages
car – regression & ANOVA tools.
install.packages("car")
MASS – classic stats and datasets.
install.packages("MASS")
psych – psychological statistics (descriptive, factor analysis).
install.packages("psych")

Data Types in R
Basic Data Types:
Numeric → Decimal values (e.g., 3.14)
Integer → Whole numbers (e.g., 5L)
Character → Text values (e.g., "Hello")
Logical → Boolean values (TRUE, FALSE)

Data Structures
Vector → Collection of elements of the same type
Matrix → 2D array of numbers (rows × columns)
Factor → Categorical data (e.g., "Male", "Female")
List → Collection of mixed data types
Data Frame → Tabular structure (rows × columns), similar to Excel

Data Import
Read the data from Excel
> library(readxl)
> data <- read_excel("C:/Users/D E L L/OneDrive/Desktop/data.xlsx")
> View(data)
READ CSV FILE
>DATA<-read.csv("C:/Users/D E L L/OneDrive/Desktop/data/data - Copy.csv")
> View(DATA)

Basic R Commands
Arithmetic
Operations
x <- 10
y <- 3
x + y # Addition
x - y # Subtraction
x * y # Multiplication
x / y # Division
x ^ y # Power
Creating Vectors(c())
v <- c(2, 4, 6, 8, 10)
print(v)
Indexing and Subsetting
v[1] # First element
v[2:4] # Elements 2 to 4
v[v > 5] # Elements greater than 5

Data Visualization in R
1. Base R Plots
Simple and built-in plotting system
x<-c(1,2,3,4,5)
y<-c(2,4,6,8,10)
plot(x, y, type=“o", col="blue", main="Base R Plot")

R PLOT

2. Advanced Graphics with
ggplot2
library(ggplot2)
data <- data.frame(Name=c("A","B","C"), Score=c(85,90,78))
ggplot(data, aes(x=Name, y=Score)) + geom_bar(stat="identity", fill="skyblue") + ggtitle("Bar Chart ")

BAR CHART

Scatter Plot
library(ggplot2)
ggplot(mtcars, aes(x=wt, y=mpg)) + geom_point(color="red") + ggtitle("Scatter Plot: mpg vs wt")

Scatter Plot

Introduction to Statistical Analysis
Role of Statistics in Data Analysis
◦ Helps in collecting, organizing, analyzing, and interpreting data.
◦ Converts raw data into meaningful insights.
◦ Essential for decision-making, prediction, and research validation.

Types of Statistical Analysis
Descriptive Statistics
◦ Summarizes data.
◦ Measures: mean, median, mode, variance, standard deviation.
◦ Example: Average exam score of a class.
Inferential Statistics
◦ Draws conclusions about a population from a sample.
◦ Techniques: hypothesis testing, confidence intervals, regression analysis.
◦ Example: Predicting election results from a survey.

Descriptive Statistics
Descriptive Statistics in R
1. Summary of Data
data <- c(85, 90, 78, 92, 88)
summary(data)
Output:
Min. 1st Qu. Median Mean 3rd Qu. Max.
78 85 88 86.6 90 92

Descriptive Statistics
2. Measures of Central Tendency
mean(data) # Mean
median(data) # Median
Output:
[1] 86.6
[1] 88

Descriptive Statistics
3. Measures of Spread (Dispersion)
var(data) # Variance
sd(data) # Standard Deviation
range(data) # Min & Max
Output:
[1] 34.3
[1] 5.86
[1] 78 92

Descriptive Statistics
4. Frequency Table
table(data)
Output:
data
78 85 88 90 92
1 1 1 1 1

Inferential statistics
A manufacturer claims that the average fuel efficiency of their new car model is 25 mpg. A
random sample of 10 cars gave the following mileages:
23, 25, 27, 24, 26, 22, 28, 25, 24, 23
Test at 5% significance level whether the claim is true using a one-sample t-test.
Step 1: State Hypotheses
Null Hypothesis (H ):
₀ μ = 25 (average mileage = 25 mpg)
Alternative Hypothesis (H ):
₁ μ ≠ 25 (average mileage is different from 25 mpg)
This is a two-tailed test.
.

Inferential statistics
#Sample data
mileage <- c(23, 25, 27, 24, 26, 22, 28, 25, 24, 23)
# Perform one-sample t-test
t.test(mileage, mu = 25)

Inferential statistics
One Sample t-test
data: mileage
t = -0.50233, df = 9, p-value = 0.6275
alternative hypothesis: true mean is not equal to 25
95 percent confidence interval:
23.349 26.051
sample estimates: mean of x : 24.7

Inferential statistics
Step 4: Inference
Test statistic t = -0.50233,
p-value = 0.6275
Since p > 0.05, we fail to reject H₀.
Conclusion: At 5% significance level, there is no significant evidence to say that the mean
mileage is different from 25 mpg. The manufacturer’s claim is reasonable.

Inferential statistics
Chi-Square Test
A die is suspected to be biased. To test this, it is rolled 60 times, and the observed frequencies of
outcomes are:
Test at the 5% significance level whether the die is fair using the Chi-square goodness of fit
test.
Face 1 2 3 4 5 6
Observed
(O)
8 9 10 12 11 10

•To find the goodness of fit
# Observed frequencies
observed <- c(8, 9, 10, 12, 11, 10)
# Expected frequencies
expected_prob <- rep(1/6, 6) # probabilities for 6 faces
# Chi-square Goodness of Fit Test
test <- chisq.test(x = observed, p = expected_prob)
print(test)
Inferential statistics

Inferential statistics
Chi-squared test for given probabilities
data: observed
X-squared = 1, df = 5, p-value = 0.9626
Here , p-value = 0.9626 >0.05 . Therefore fail to reject null hypothesis.
Conclusion: At 5% significance level, there is no significant evidence to say that the die is
unfair. The die appears fair.

Inferential statistics
A researcher wants to test whether the mean test scores differ among three different teaching
methods. The scores of students are recorded as follows:
Group A (Method 1): 85, 90, 88
Group B (Method 2): 70, 75, 80
Group C (Method 3): 95, 92, 89
Is there a significant difference in mean scores among the three teaching methods at the 5%
significance level?

Step 1: Hypotheses
Null Hypothesis (H ):
₀ μ = μ = μ (all groups have equal mean scores).
₁ ₂ ₃
Alternative Hypothesis (H ):
₁ At least one group mean differs.
ANOVA (Analysis of Variance)
•Compare means across 3+ groups
df <- data.frame( score = c(85,90,88,70,75,80,95,92,89),
group = rep(c("A","B","C"), each=3))
aov_res <- aov(score ~ group, data=df)
summary(aov_res)
Inferential statistics

Inferential statistics
Output (simplified):

This tells us how much larger the between-group variance is compared to the within-group
variance.
The probability of getting an F-value as large as 17.41 (or larger) if H is true
₀ .
Since p = 0.00317 < 0.05, we reject the null hypothesis.
Conclusion: At least one group mean is significantly different.
Inferential statistics

Probability Distributions in R
Binomial Distribution
A fair coin (p = 0.5 for heads) is tossed 10 times.
Find the probability of getting exactly 5 heads.
Simulate 10 random outcomes of tossing the coin 10 times.
Here,
n= 10 (number of trials)
K=5 (success)
p=0.5

# Probability of 5 successes in 10 trials with p=0.5
dbinom(5, size=10, prob=0.5)
# Generate random binomial values
rbinom(10, size=10, prob=0.5)
Output Example:
[1] 0.246 # Probability of exactly 5 successes
[1] 6 5 4 7 5 6 5 4 3 7 # Random outcomes
Probability Distributions in R

Conclusion for the Problem
The probability of getting exactly 5 heads in 10 coin tosses is 0.2461 (≈ 24.6%).
◦ This means if we repeat the experiment many times, about 1 in 4 trials will result in exactly 5
heads.
The random simulation using rbinom() shows how the number of heads can vary across
repeated experiments of 10 tosses each.
◦ The values fluctuate around 5, consistent with the theoretical expectation .
Overall: The binomial model confirms that getting exactly 5 successes (heads) in 10 fair coin
tosses is the most likely outcome, but not guaranteed—it occurs about 25% of the time.
Probability Distributions in R

Probability Distributions in R
Poisson Distribution
A call center receives an average of 3 calls per minute.
Find the probability that exactly 5 calls are received in a given minute.
Find the probability that at most 2 calls are received in a given minute.
Simulate the number of calls received in 10 minutes using the Poisson distribution.

# 1. Probability of exactly 5 calls (lambda = 3)
dpois(5, lambda = 3)
# 2. Probability of at most 2 calls
ppois(2, lambda = 3)
# 3. Simulate number of calls in 10 minutes
rpois(10, lambda = 3)
Probability Distributions in R

# 1. Probability of exactly 5 calls
dpois(5, lambda = 3)
[1] 0.1008188
# 2. Probability of at most 2 calls
> ppois(2, lambda = 3)
[1] 0.4231901
# 3. Simulate number of calls in 10 minutes
rpois(10, lambda = 3)
[1] 3 3 1 4 2 2 3 2 0 8
Probability Distributions in R

Conclusion
The probability of getting exactly 5 calls in one minute is ≈ 10%.
The probability of getting at most 2 calls is ≈ 42%.
The simulation shows how the number of calls fluctuates around the average ().
.
Probability Distributions in R

Probability Distributions in R
1. Normal Distribution
The exam scores of students in a class are normally distributed with a mean (μ) = 70 and a
standard deviation (σ) = 10.
Find the probability that a randomly selected student scores less than 80.
Find the probability that a student scores between 60 and 75.
Simulate the exam scores of 10 students using the given distribution.

# 1. Probability of scoring less than 80
pnorm(80, mean = 70, sd = 10)
# 2. Probability of scoring between 60 and 75
pnorm(75, mean = 70, sd = 10) - pnorm(60, mean = 70, sd = 10)
# 3. Simulate exam scores for 10 students
rnorm(10, mean = 70, sd = 10)
Probability Distributions in R

P(X < 80):
pnorm(80, mean = 70, sd = 10)
# [1] 0.8413447
There is about 84.13% chance that a student scores less than 80.
P(60 < X < 75):
pnorm(75, mean = 70, sd = 10) - pnorm(60, mean = 70, sd = 10)
# [1] 0.5328072
There is about 53.28% chance that a student scores between 60 and 75.
Probability Distributions in R

Simulated scores:
rnorm(10, mean = 70, sd = 10)
# Example output: [1] 68.5 72.3 81.0 59.4 74.2 65.8 69.1 77.6 62.8 71.4
These are 10 randomly generated exam scores based on the normal distribution.
Conclusion
Most students are likely to score below 80 (84% probability).
Over half the students (53% probability) will fall between 60 and 75.
The simulated results reflect how scores cluster around the mean (70) with some variation due to
standard deviation.
Probability Distributions in R

Entrepreneurship With R software
Entrepreneurship utilizes the R software for business analytics, enabling data-driven decision-making
through powerful statistical analysis, forecasting, and visualization of trends in customer behavior,
sales, and system performance. Entrepreneurs can leverage R to load, manipulate, and visualize
complex datasets, identify patterns, conduct predictive modeling, and generate actionable insights from
market and operational data to gain a competitive edge.
Business Analytics & Data Mining:
R is an open-source statistical software environment used for high-end graphics and statistical
computations, making it a powerful tool for business analytics and data mining.

Entrepreneurship With R software
Data Visualization:
Entrepreneurs can use packages like ggplot2 to create clear, easy-to-read charts and graphs that
transform raw data into impactful visualizations, helping to identify trends and patterns in data.
Predictive Analytics:
R enables businesses to forecast trends, classify outcomes, and analyze time-dependent data using
regression analysis, classification models, and time series forecasting.
Customer Behavior Analysis:
By analyzing customer data, entrepreneurs can understand needs and preferences, leading to more
tailored products and services.

Sales & Performance Forecasting:
R can be used to forecast sales, model system performance, and predict potential losses,
supporting smarter, data-driven business decisions.
Quantitative Research:
R facilitates the analysis of public and proprietary datasets, helping entrepreneurs conduct
research, test theories, and generate strategic insights.
Market Trend Analysis:
Entrepreneurs can analyze social media trends and market data to refine strategies, improve
customer engagement, and make informed business decisions.
Entrepreneurship With R software

HARNESSING R: STATISTICAL TECHNIQUES FOR DATA-DRIVEN ENTREPRENEURSHIP

Recommended

PPTX

R programming for graphical representation

bysanjaysushil266

PDF

R programming intro with examples

PDF

Statistics for data scientists

PDF

Test Bank for Stats Data and Models 5th by De Veaux

PDF

Lecturenotesstatistics

PDF

Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)

bySherri Gunder

PDF

R - what do the numbers mean? #RStats

PPTX

CS194Lec0hbh6EDA.pptx

byPrudhvirajEluri1

PPTX

Unit 4 Statistical Data Analysis for BTech 5th SEM

byrishabh248001

PPTX

Unit IV.pptx for statistical data analysis

byrishabh248001

PDF

Unit---4.pdf how to gst du paper in this day and age

PDF

1.Introduction to Biostatistics MBChB 6 - DPH 6024.pdf

byluapulachishipula14

PDF

Basics of R programming for analytics [Autosaved] (1).pdf

PDF

Big_DM_24_MS_Topic_02_Understanding Data.pdf

PDF

Data Science_Chapter -2_Statical Data Analysis.pdf

bysangeeta borde

PPTX

R Language Introduction

byKhaled Al-Shamaa

PPTX

1. Descriptive statistics.pptx engineering

PDF

Data Analysis using R (BASIC)_R SOFTWARE

byshrikrishna kesharwani

PDF

PPT - Introduction to R.pdf

PPTX

ststs nw.pptx

PDF

Using r

DOCX

R Activity in Biostatistics

PPTX

Data in science

bySreejith Aravindakshan

PDF

ISSTA'16 Summer School: Intro to Statistics

byAndrea Arcuri

PDF

R tutorial

byRichard Vidgen

PPTX

Introduction to basic statistics

PDF

Making Sense of Data Big and Small

byBruno Gonçalves

PPT

Stats-Review-Maie-St-John-5-20-2009.ppt

byDiptoKumerSarker1

PPTX

Econometric Problem of Autocorrelation-Heteroskedasticity (1).pptx

PDF

Beacon Kit paper pdf World Game (s) redesign

More Related Content

PPTX

R programming for graphical representation

bysanjaysushil266

PDF

R programming intro with examples

PDF

Statistics for data scientists

PDF

Test Bank for Stats Data and Models 5th by De Veaux

PDF

Lecturenotesstatistics

PDF

Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)

bySherri Gunder

PDF

R - what do the numbers mean? #RStats

PPTX

CS194Lec0hbh6EDA.pptx

byPrudhvirajEluri1

R programming for graphical representation

bysanjaysushil266

R programming intro with examples

Statistics for data scientists

Test Bank for Stats Data and Models 5th by De Veaux

Lecturenotesstatistics

Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)

bySherri Gunder

R - what do the numbers mean? #RStats

CS194Lec0hbh6EDA.pptx

byPrudhvirajEluri1

Similar to HARNESSING R: STATISTICAL TECHNIQUES FOR DATA-DRIVEN ENTREPRENEURSHIP

PPTX

Unit 4 Statistical Data Analysis for BTech 5th SEM

byrishabh248001

PPTX

Unit IV.pptx for statistical data analysis

byrishabh248001

PDF

Unit---4.pdf how to gst du paper in this day and age

PDF

1.Introduction to Biostatistics MBChB 6 - DPH 6024.pdf

byluapulachishipula14

PDF

Basics of R programming for analytics [Autosaved] (1).pdf

PDF

Big_DM_24_MS_Topic_02_Understanding Data.pdf

PDF

Data Science_Chapter -2_Statical Data Analysis.pdf

bysangeeta borde

PPTX

R Language Introduction

byKhaled Al-Shamaa

PPTX

1. Descriptive statistics.pptx engineering

PDF

Data Analysis using R (BASIC)_R SOFTWARE

byshrikrishna kesharwani

PDF

PPT - Introduction to R.pdf

PPTX

ststs nw.pptx

PDF

Using r

DOCX

R Activity in Biostatistics

PPTX

Data in science

bySreejith Aravindakshan

PDF

ISSTA'16 Summer School: Intro to Statistics

byAndrea Arcuri

PDF

R tutorial

byRichard Vidgen

PPTX

Introduction to basic statistics

PDF

Making Sense of Data Big and Small

byBruno Gonçalves

PPT

Stats-Review-Maie-St-John-5-20-2009.ppt

byDiptoKumerSarker1

Unit 4 Statistical Data Analysis for BTech 5th SEM

byrishabh248001

Unit IV.pptx for statistical data analysis

byrishabh248001

Unit---4.pdf how to gst du paper in this day and age

1.Introduction to Biostatistics MBChB 6 - DPH 6024.pdf

byluapulachishipula14

Basics of R programming for analytics [Autosaved] (1).pdf

Big_DM_24_MS_Topic_02_Understanding Data.pdf

Data Science_Chapter -2_Statical Data Analysis.pdf

bysangeeta borde

R Language Introduction

byKhaled Al-Shamaa

1. Descriptive statistics.pptx engineering

Data Analysis using R (BASIC)_R SOFTWARE

byshrikrishna kesharwani

PPT - Introduction to R.pdf

ststs nw.pptx

Using r

R Activity in Biostatistics

Data in science

bySreejith Aravindakshan

ISSTA'16 Summer School: Intro to Statistics

byAndrea Arcuri

R tutorial

byRichard Vidgen

Introduction to basic statistics

Making Sense of Data Big and Small

byBruno Gonçalves

Stats-Review-Maie-St-John-5-20-2009.ppt

byDiptoKumerSarker1

Recently uploaded

PPTX

Econometric Problem of Autocorrelation-Heteroskedasticity (1).pptx

PDF

Beacon Kit paper pdf World Game (s) redesign

PDF

AWS AI In Practice #2 - Intelligent Document Processing

byphilipbasford1

PPTX

statistics importance and scope of stats

PDF

Advanced Data Automation, Orchestration & Cloud Analytics | Wisecor Transform...

bywisecortransformatio

PDF

[Redis Released]- FalkorDB - Redis + Graph Agentic Memory’s Secret Sauce

PPTX

Introduction to the Census 2031 topic consultation webinar (12 November 2025)

byOffice for National Statistics

PDF

Document 2.pdf bizimkiler yapmış yine dvfweefsdc

byAlperturkoglu

PPTX

TERMA DATIS PPT IN AVIATION SECTOR .pptx

PPT

store manager keep track of inventory (2).ppt

byaneesahamedmagi

PDF

Chapter 15 - Sine Cosine and Tangent of angles more than 90 degrees.pptx.pdf

bykumayasmiguel

PPTX

Data_Analysis_Plan-corrected-Biostatistics.pptx

byFahmida Swati

PPTX

English Presentation automation -Why technology is replacing jobs?

PPTX

Correlation-Regression analysis -16.11.25.pptx

byFahmida Swati

PPTX

Recurrent-Neural-Networks-RNNs.pptx (1).pptx

byharbingernoobsaibot

PPTX

Holiday Class Agenda Education Presentation in Pink Red and Green Simple Orna...

byAngelDebatian

PPTX

MASS INTERPRETATION OF ORGANIC COMPOUNDS

byDevipriya787793

PPT

PETROLEUM TRAPS.pptPETROLEUM TRAPS.pptmb

byalialialsatre

PPTX

Synthesis_of_Elements_with_Final_Speaker_Notesgjkh

bycinhsteachershe

PPTX

Copy of Criminology Major for College_ Criminal Statistics by Slidesgo.pptx.pptx

Econometric Problem of Autocorrelation-Heteroskedasticity (1).pptx

Beacon Kit paper pdf World Game (s) redesign

AWS AI In Practice #2 - Intelligent Document Processing

byphilipbasford1

statistics importance and scope of stats

Advanced Data Automation, Orchestration & Cloud Analytics | Wisecor Transform...

bywisecortransformatio

[Redis Released]- FalkorDB - Redis + Graph Agentic Memory’s Secret Sauce

Introduction to the Census 2031 topic consultation webinar (12 November 2025)

byOffice for National Statistics

Document 2.pdf bizimkiler yapmış yine dvfweefsdc

byAlperturkoglu

TERMA DATIS PPT IN AVIATION SECTOR .pptx

store manager keep track of inventory (2).ppt

byaneesahamedmagi

Chapter 15 - Sine Cosine and Tangent of angles more than 90 degrees.pptx.pdf

bykumayasmiguel

Data_Analysis_Plan-corrected-Biostatistics.pptx

byFahmida Swati

English Presentation automation -Why technology is replacing jobs?

Correlation-Regression analysis -16.11.25.pptx

byFahmida Swati

Recurrent-Neural-Networks-RNNs.pptx (1).pptx

byharbingernoobsaibot

Holiday Class Agenda Education Presentation in Pink Red and Green Simple Orna...

byAngelDebatian

MASS INTERPRETATION OF ORGANIC COMPOUNDS

byDevipriya787793

PETROLEUM TRAPS.pptPETROLEUM TRAPS.pptmb

byalialialsatre

Synthesis_of_Elements_with_Final_Speaker_Notesgjkh

bycinhsteachershe

Copy of Criminology Major for College_ Criminal Statistics by Slidesgo.pptx.pptx

HARNESSING R: STATISTICAL TECHNIQUES FOR DATA-DRIVEN ENTREPRENEURSHIP

1.
HARNESSING R: STATISTICAL TECHNIQUESFOR DATA- DRIVEN ENTREPRENEURSHIP DR. SABERUNNISA. A ASSISTANT PROFESSOR THE MADURA COLLEGE(AUTONOMOUS) MADURAI
2.
WHAT IS R? Open-sourcestatistical software: Free to use, developed under the GNU license. Programming language & environment: Specifically designed for statistical computing and graphics. Widely used in Data Science, Machine Learning, and Statistics for: Data manipulation & cleaning Statistical modeling Visualization (graphs, charts, plots) Predictive analytics Community-driven: Thousands of user-contributed packages on CRAN.
3.
History and Backgroundof R Developed by Ross Ihaka and Robert Gentleman in 1993 at the University of Auckland, New Zealand. Based on the S programming language, originally developed at Bell Laboratories. First official release in 1995; Version 1.0.0 was launched in 2000. Supported by the R Foundation for Statistical Computing (established in 2003). Today, R is globally recognized as one of the leading tools for statistical computing, data science, and research.
4.
Features of R Free and Open Source  Available at no cost, licensed under GNU.  Huge Package Ecosystem (CRAN)  19,000+ packages covering statistics, ML, data visualization, bioinformatics, etc.  Cross-Platform Compatibility  Runs smoothly on Windows, Mac, and Linux systems.  Strong Visualization Libraries  Base R graphics and advanced tools like ggplot2, lattice, plotly.  Highly Extensible  Users can develop custom functions and packages.  Active Community Support  Global network of developers and researchers.
5.
Installing and UsingR Download from CRAN (Comprehensive R Archive Network) ◦ Official website: https://cran.r-project.org ◦ Choose installer for Windows, Mac, or Linux. RStudio IDE (Integrated Development Environment) ◦ Provides a user-friendly interface for coding in R. ◦ Features: script editor, console, plots, package manager.
6.
Interface Visuals
7.
Installing Packages tidyverse –a collection of packages for data wrangling & visualization. install.packages("tidyverse") (includes ggplot2, dplyr, readr, tidyr, tibble, stringr, forcats) data.table – for super-fast data manipulation. install.packages("data.table")
8.
Installing Packages readxl &openxlsx – to import/export Excel files. install.packages("readxl") install.packages("openxlsx") haven – to read SPSS, SAS, and Stata files. install.packages("haven")
9.
Statistical Analysis Packages car– regression & ANOVA tools. install.packages("car") MASS – classic stats and datasets. install.packages("MASS") psych – psychological statistics (descriptive, factor analysis). install.packages("psych")
10.
Data Types inR Basic Data Types: Numeric → Decimal values (e.g., 3.14) Integer → Whole numbers (e.g., 5L) Character → Text values (e.g., "Hello") Logical → Boolean values (TRUE, FALSE)
11.
Data Structures Vector →Collection of elements of the same type Matrix → 2D array of numbers (rows × columns) Factor → Categorical data (e.g., "Male", "Female") List → Collection of mixed data types Data Frame → Tabular structure (rows × columns), similar to Excel
12.
Data Import Read thedata from Excel > library(readxl) > data <- read_excel("C:/Users/D E L L/OneDrive/Desktop/data.xlsx") > View(data) READ CSV FILE >DATA<-read.csv("C:/Users/D E L L/OneDrive/Desktop/data/data - Copy.csv") > View(DATA)
13.
Basic R Commands Arithmetic Operations x<- 10 y <- 3 x + y # Addition x - y # Subtraction x * y # Multiplication x / y # Division x ^ y # Power Creating Vectors(c()) v <- c(2, 4, 6, 8, 10) print(v) Indexing and Subsetting v[1] # First element v[2:4] # Elements 2 to 4 v[v > 5] # Elements greater than 5
14.
Data Visualization inR 1. Base R Plots Simple and built-in plotting system x<-c(1,2,3,4,5) y<-c(2,4,6,8,10) plot(x, y, type=“o", col="blue", main="Base R Plot")
15.
R PLOT
16.
2. Advanced Graphicswith ggplot2 library(ggplot2) data <- data.frame(Name=c("A","B","C"), Score=c(85,90,78)) ggplot(data, aes(x=Name, y=Score)) + geom_bar(stat="identity", fill="skyblue") + ggtitle("Bar Chart ")
17.
BAR CHART
18.
Scatter Plot library(ggplot2) ggplot(mtcars, aes(x=wt,y=mpg)) + geom_point(color="red") + ggtitle("Scatter Plot: mpg vs wt")
19.
Scatter Plot
20.
Introduction to StatisticalAnalysis Role of Statistics in Data Analysis ◦ Helps in collecting, organizing, analyzing, and interpreting data. ◦ Converts raw data into meaningful insights. ◦ Essential for decision-making, prediction, and research validation.
21.
Types of StatisticalAnalysis Descriptive Statistics ◦ Summarizes data. ◦ Measures: mean, median, mode, variance, standard deviation. ◦ Example: Average exam score of a class. Inferential Statistics ◦ Draws conclusions about a population from a sample. ◦ Techniques: hypothesis testing, confidence intervals, regression analysis. ◦ Example: Predicting election results from a survey.
22.
Descriptive Statistics Descriptive Statisticsin R 1. Summary of Data data <- c(85, 90, 78, 92, 88) summary(data) Output: Min. 1st Qu. Median Mean 3rd Qu. Max. 78 85 88 86.6 90 92
23.
Descriptive Statistics 2. Measuresof Central Tendency mean(data) # Mean median(data) # Median Output: [1] 86.6 [1] 88
24.
Descriptive Statistics 3. Measuresof Spread (Dispersion) var(data) # Variance sd(data) # Standard Deviation range(data) # Min & Max Output: [1] 34.3 [1] 5.86 [1] 78 92
25.
Descriptive Statistics 4. FrequencyTable table(data) Output: data 78 85 88 90 92 1 1 1 1 1
26.
Inferential statistics A manufacturerclaims that the average fuel efficiency of their new car model is 25 mpg. A random sample of 10 cars gave the following mileages: 23, 25, 27, 24, 26, 22, 28, 25, 24, 23 Test at 5% significance level whether the claim is true using a one-sample t-test. Step 1: State Hypotheses Null Hypothesis (H ): ₀ μ = 25 (average mileage = 25 mpg) Alternative Hypothesis (H ): ₁ μ ≠ 25 (average mileage is different from 25 mpg) This is a two-tailed test. .
27.
Inferential statistics #Sample data mileage<- c(23, 25, 27, 24, 26, 22, 28, 25, 24, 23) # Perform one-sample t-test t.test(mileage, mu = 25)
28.
Inferential statistics One Samplet-test data: mileage t = -0.50233, df = 9, p-value = 0.6275 alternative hypothesis: true mean is not equal to 25 95 percent confidence interval: 23.349 26.051 sample estimates: mean of x : 24.7
29.
Inferential statistics Step 4:Inference Test statistic t = -0.50233, p-value = 0.6275 Since p > 0.05, we fail to reject H₀. Conclusion: At 5% significance level, there is no significant evidence to say that the mean mileage is different from 25 mpg. The manufacturer’s claim is reasonable.
30.
Inferential statistics Chi-Square Test Adie is suspected to be biased. To test this, it is rolled 60 times, and the observed frequencies of outcomes are: Test at the 5% significance level whether the die is fair using the Chi-square goodness of fit test. Face 1 2 3 4 5 6 Observed (O) 8 9 10 12 11 10
31.
•To find thegoodness of fit # Observed frequencies observed <- c(8, 9, 10, 12, 11, 10) # Expected frequencies expected_prob <- rep(1/6, 6) # probabilities for 6 faces # Chi-square Goodness of Fit Test test <- chisq.test(x = observed, p = expected_prob) print(test) Inferential statistics
32.
Inferential statistics Chi-squared testfor given probabilities data: observed X-squared = 1, df = 5, p-value = 0.9626 Here , p-value = 0.9626 >0.05 . Therefore fail to reject null hypothesis. Conclusion: At 5% significance level, there is no significant evidence to say that the die is unfair. The die appears fair.
33.
Inferential statistics A researcherwants to test whether the mean test scores differ among three different teaching methods. The scores of students are recorded as follows: Group A (Method 1): 85, 90, 88 Group B (Method 2): 70, 75, 80 Group C (Method 3): 95, 92, 89 Is there a significant difference in mean scores among the three teaching methods at the 5% significance level?
34.
Step 1: Hypotheses NullHypothesis (H ): ₀ μ = μ = μ (all groups have equal mean scores). ₁ ₂ ₃ Alternative Hypothesis (H ): ₁ At least one group mean differs. ANOVA (Analysis of Variance) •Compare means across 3+ groups df <- data.frame( score = c(85,90,88,70,75,80,95,92,89), group = rep(c("A","B","C"), each=3)) aov_res <- aov(score ~ group, data=df) summary(aov_res) Inferential statistics
35.
Inferential statistics Output (simplified):
36.
This tells ushow much larger the between-group variance is compared to the within-group variance. The probability of getting an F-value as large as 17.41 (or larger) if H is true ₀ . Since p = 0.00317 < 0.05, we reject the null hypothesis. Conclusion: At least one group mean is significantly different. Inferential statistics
37.
Probability Distributions inR Binomial Distribution A fair coin (p = 0.5 for heads) is tossed 10 times. Find the probability of getting exactly 5 heads. Simulate 10 random outcomes of tossing the coin 10 times. Here, n= 10 (number of trials) K=5 (success) p=0.5
38.
# Probability of5 successes in 10 trials with p=0.5 dbinom(5, size=10, prob=0.5) # Generate random binomial values rbinom(10, size=10, prob=0.5) Output Example: [1] 0.246 # Probability of exactly 5 successes [1] 6 5 4 7 5 6 5 4 3 7 # Random outcomes Probability Distributions in R
39.
Conclusion for theProblem The probability of getting exactly 5 heads in 10 coin tosses is 0.2461 (≈ 24.6%). ◦ This means if we repeat the experiment many times, about 1 in 4 trials will result in exactly 5 heads. The random simulation using rbinom() shows how the number of heads can vary across repeated experiments of 10 tosses each. ◦ The values fluctuate around 5, consistent with the theoretical expectation . Overall: The binomial model confirms that getting exactly 5 successes (heads) in 10 fair coin tosses is the most likely outcome, but not guaranteed—it occurs about 25% of the time. Probability Distributions in R
40.
Probability Distributions inR Poisson Distribution A call center receives an average of 3 calls per minute. Find the probability that exactly 5 calls are received in a given minute. Find the probability that at most 2 calls are received in a given minute. Simulate the number of calls received in 10 minutes using the Poisson distribution.
41.
# 1. Probabilityof exactly 5 calls (lambda = 3) dpois(5, lambda = 3) # 2. Probability of at most 2 calls ppois(2, lambda = 3) # 3. Simulate number of calls in 10 minutes rpois(10, lambda = 3) Probability Distributions in R
42.
# 1. Probabilityof exactly 5 calls dpois(5, lambda = 3) [1] 0.1008188 # 2. Probability of at most 2 calls > ppois(2, lambda = 3) [1] 0.4231901 # 3. Simulate number of calls in 10 minutes rpois(10, lambda = 3) [1] 3 3 1 4 2 2 3 2 0 8 Probability Distributions in R
43.
Conclusion The probability ofgetting exactly 5 calls in one minute is ≈ 10%. The probability of getting at most 2 calls is ≈ 42%. The simulation shows how the number of calls fluctuates around the average (). . Probability Distributions in R
44.
Probability Distributions inR 1. Normal Distribution The exam scores of students in a class are normally distributed with a mean (μ) = 70 and a standard deviation (σ) = 10. Find the probability that a randomly selected student scores less than 80. Find the probability that a student scores between 60 and 75. Simulate the exam scores of 10 students using the given distribution.
45.
# 1. Probabilityof scoring less than 80 pnorm(80, mean = 70, sd = 10) # 2. Probability of scoring between 60 and 75 pnorm(75, mean = 70, sd = 10) - pnorm(60, mean = 70, sd = 10) # 3. Simulate exam scores for 10 students rnorm(10, mean = 70, sd = 10) Probability Distributions in R
46.
P(X < 80): pnorm(80,mean = 70, sd = 10) # [1] 0.8413447 There is about 84.13% chance that a student scores less than 80. P(60 < X < 75): pnorm(75, mean = 70, sd = 10) - pnorm(60, mean = 70, sd = 10) # [1] 0.5328072 There is about 53.28% chance that a student scores between 60 and 75. Probability Distributions in R
47.
Simulated scores: rnorm(10, mean= 70, sd = 10) # Example output: [1] 68.5 72.3 81.0 59.4 74.2 65.8 69.1 77.6 62.8 71.4 These are 10 randomly generated exam scores based on the normal distribution. Conclusion Most students are likely to score below 80 (84% probability). Over half the students (53% probability) will fall between 60 and 75. The simulated results reflect how scores cluster around the mean (70) with some variation due to standard deviation. Probability Distributions in R
48.
Entrepreneurship With Rsoftware Entrepreneurship utilizes the R software for business analytics, enabling data-driven decision-making through powerful statistical analysis, forecasting, and visualization of trends in customer behavior, sales, and system performance. Entrepreneurs can leverage R to load, manipulate, and visualize complex datasets, identify patterns, conduct predictive modeling, and generate actionable insights from market and operational data to gain a competitive edge. Business Analytics & Data Mining: R is an open-source statistical software environment used for high-end graphics and statistical computations, making it a powerful tool for business analytics and data mining.
49.
Entrepreneurship With Rsoftware Data Visualization: Entrepreneurs can use packages like ggplot2 to create clear, easy-to-read charts and graphs that transform raw data into impactful visualizations, helping to identify trends and patterns in data. Predictive Analytics: R enables businesses to forecast trends, classify outcomes, and analyze time-dependent data using regression analysis, classification models, and time series forecasting. Customer Behavior Analysis: By analyzing customer data, entrepreneurs can understand needs and preferences, leading to more tailored products and services.
50.
Sales & PerformanceForecasting: R can be used to forecast sales, model system performance, and predict potential losses, supporting smarter, data-driven business decisions. Quantitative Research: R facilitates the analysis of public and proprietary datasets, helping entrepreneurs conduct research, test theories, and generate strategic insights. Market Trend Analysis: Entrepreneurs can analyze social media trends and market data to refine strategies, improve customer engagement, and make informed business decisions. Entrepreneurship With R software