HARNESSING R: STATISTICAL
TECHNIQUES FOR DATA-
DRIVEN
ENTREPRENEURSHIP
DR. SABERUNNISA. A
ASSISTANT PROFESSOR
THE MADURA
COLLEGE(AUTONOMOUS)
MADURAI
WHAT IS R?
Open-source statistical software: Free to use, developed under the GNU license.
Programming language & environment: Specifically designed for statistical
computing and graphics.
Widely used in Data Science, Machine Learning, and Statistics for:
Data manipulation & cleaning
Statistical modeling
Visualization (graphs, charts, plots)
Predictive analytics
Community-driven: Thousands of user-contributed packages on CRAN.
History and Background of R
Developed by Ross Ihaka and Robert Gentleman in 1993 at the University of
Auckland, New Zealand.
Based on the S programming language, originally developed at Bell Laboratories.
First official release in 1995; Version 1.0.0 was launched in 2000.
Supported by the R Foundation for Statistical Computing (established in 2003).
Today, R is globally recognized as one of the leading tools for statistical computing,
data science, and research.
Features of R
 Free and Open Source
 Available at no cost, licensed under
GNU.
 Huge Package Ecosystem (CRAN)
 19,000+ packages covering statistics,
ML, data visualization, bioinformatics,
etc.
 Cross-Platform Compatibility
 Runs smoothly on Windows, Mac,
and Linux systems.
 Strong Visualization Libraries
 Base R graphics and advanced tools
like ggplot2, lattice, plotly.
 Highly Extensible
 Users can develop custom functions
and packages.
 Active Community Support
 Global network of developers and
researchers.
Installing and Using R
Download from CRAN (Comprehensive R Archive Network)
◦ Official website: https://cran.r-project.org
◦ Choose installer for Windows, Mac, or Linux.
RStudio IDE (Integrated Development Environment)
◦ Provides a user-friendly interface for coding in R.
◦ Features: script editor, console, plots, package manager.
Interface Visuals
Installing Packages
tidyverse – a collection of packages for data wrangling & visualization.
install.packages("tidyverse")
(includes ggplot2, dplyr, readr, tidyr, tibble, stringr, forcats)
data.table – for super-fast data manipulation.
install.packages("data.table")
Installing Packages
readxl & openxlsx – to import/export Excel files.
install.packages("readxl")
install.packages("openxlsx")
haven – to read SPSS, SAS, and Stata files.
install.packages("haven")
Statistical Analysis Packages
car – regression & ANOVA tools.
install.packages("car")
MASS – classic stats and datasets.
install.packages("MASS")
psych – psychological statistics (descriptive, factor analysis).
install.packages("psych")
Data Types in R
Basic Data Types:
Numeric → Decimal values (e.g., 3.14)
Integer → Whole numbers (e.g., 5L)
Character → Text values (e.g., "Hello")
Logical → Boolean values (TRUE, FALSE)
Data Structures
Vector → Collection of elements of the same type
Matrix → 2D array of numbers (rows × columns)
Factor → Categorical data (e.g., "Male", "Female")
List → Collection of mixed data types
Data Frame → Tabular structure (rows × columns), similar to Excel
Data Import
Read the data from Excel
> library(readxl)
> data <- read_excel("C:/Users/D E L L/OneDrive/Desktop/data.xlsx")
> View(data)
READ CSV FILE
>DATA<-read.csv("C:/Users/D E L L/OneDrive/Desktop/data/data - Copy.csv")
> View(DATA)
Basic R Commands
Arithmetic
Operations
x <- 10
y <- 3
x + y # Addition
x - y # Subtraction
x * y # Multiplication
x / y # Division
x ^ y # Power
Creating Vectors(c())
v <- c(2, 4, 6, 8, 10)
print(v)
Indexing and Subsetting
v[1] # First element
v[2:4] # Elements 2 to 4
v[v > 5] # Elements greater than 5
Data Visualization in R
1. Base R Plots
Simple and built-in plotting system
x<-c(1,2,3,4,5)
y<-c(2,4,6,8,10)
plot(x, y, type=“o", col="blue", main="Base R Plot")
R PLOT
2. Advanced Graphics with
ggplot2
library(ggplot2)
data <- data.frame(Name=c("A","B","C"), Score=c(85,90,78))
ggplot(data, aes(x=Name, y=Score)) + geom_bar(stat="identity", fill="skyblue") + ggtitle("Bar Chart ")
BAR CHART
Scatter Plot
library(ggplot2)
ggplot(mtcars, aes(x=wt, y=mpg)) + geom_point(color="red") + ggtitle("Scatter Plot: mpg vs wt")
Scatter Plot
Introduction to Statistical Analysis
Role of Statistics in Data Analysis
◦ Helps in collecting, organizing, analyzing, and interpreting data.
◦ Converts raw data into meaningful insights.
◦ Essential for decision-making, prediction, and research validation.
Types of Statistical Analysis
Descriptive Statistics
◦ Summarizes data.
◦ Measures: mean, median, mode, variance, standard deviation.
◦ Example: Average exam score of a class.
Inferential Statistics
◦ Draws conclusions about a population from a sample.
◦ Techniques: hypothesis testing, confidence intervals, regression analysis.
◦ Example: Predicting election results from a survey.
Descriptive Statistics
Descriptive Statistics in R
1. Summary of Data
data <- c(85, 90, 78, 92, 88)
summary(data)
Output:
Min. 1st Qu. Median Mean 3rd Qu. Max.
78 85 88 86.6 90 92
Descriptive Statistics
2. Measures of Central Tendency
mean(data) # Mean
median(data) # Median
Output:
[1] 86.6
[1] 88
Descriptive Statistics
3. Measures of Spread (Dispersion)
var(data) # Variance
sd(data) # Standard Deviation
range(data) # Min & Max
Output:
[1] 34.3
[1] 5.86
[1] 78 92
Descriptive Statistics
4. Frequency Table
table(data)
Output:
data
78 85 88 90 92
1 1 1 1 1
Inferential statistics
A manufacturer claims that the average fuel efficiency of their new car model is 25 mpg. A
random sample of 10 cars gave the following mileages:
23, 25, 27, 24, 26, 22, 28, 25, 24, 23
Test at 5% significance level whether the claim is true using a one-sample t-test.
Step 1: State Hypotheses
Null Hypothesis (H ):
₀ μ = 25 (average mileage = 25 mpg)
Alternative Hypothesis (H ):
₁ μ ≠ 25 (average mileage is different from 25 mpg)
This is a two-tailed test.
.
Inferential statistics
#Sample data
mileage <- c(23, 25, 27, 24, 26, 22, 28, 25, 24, 23)
# Perform one-sample t-test
t.test(mileage, mu = 25)
Inferential statistics
One Sample t-test
data: mileage
t = -0.50233, df = 9, p-value = 0.6275
alternative hypothesis: true mean is not equal to 25
95 percent confidence interval:
23.349 26.051
sample estimates: mean of x : 24.7
Inferential statistics
Step 4: Inference
Test statistic t = -0.50233,
p-value = 0.6275
Since p > 0.05, we fail to reject H₀.
Conclusion: At 5% significance level, there is no significant evidence to say that the mean
mileage is different from 25 mpg. The manufacturer’s claim is reasonable.
Inferential statistics
Chi-Square Test
A die is suspected to be biased. To test this, it is rolled 60 times, and the observed frequencies of
outcomes are:
Test at the 5% significance level whether the die is fair using the Chi-square goodness of fit
test.
Face 1 2 3 4 5 6
Observed
(O)
8 9 10 12 11 10
•To find the goodness of fit
# Observed frequencies
observed <- c(8, 9, 10, 12, 11, 10)
# Expected frequencies
expected_prob <- rep(1/6, 6) # probabilities for 6 faces
# Chi-square Goodness of Fit Test
test <- chisq.test(x = observed, p = expected_prob)
print(test)
Inferential statistics
Inferential statistics
Chi-squared test for given probabilities
data: observed
X-squared = 1, df = 5, p-value = 0.9626
Here , p-value = 0.9626 >0.05 . Therefore fail to reject null hypothesis.
Conclusion: At 5% significance level, there is no significant evidence to say that the die is
unfair. The die appears fair.
Inferential statistics
A researcher wants to test whether the mean test scores differ among three different teaching
methods. The scores of students are recorded as follows:
Group A (Method 1): 85, 90, 88
Group B (Method 2): 70, 75, 80
Group C (Method 3): 95, 92, 89
Is there a significant difference in mean scores among the three teaching methods at the 5%
significance level?
Step 1: Hypotheses
Null Hypothesis (H ):
₀ μ = μ = μ (all groups have equal mean scores).
₁ ₂ ₃
Alternative Hypothesis (H ):
₁ At least one group mean differs.
ANOVA (Analysis of Variance)
•Compare means across 3+ groups
df <- data.frame( score = c(85,90,88,70,75,80,95,92,89),
group = rep(c("A","B","C"), each=3))
aov_res <- aov(score ~ group, data=df)
summary(aov_res)
Inferential statistics
Inferential statistics
Output (simplified):
This tells us how much larger the between-group variance is compared to the within-group
variance.
The probability of getting an F-value as large as 17.41 (or larger) if H is true
₀ .
Since p = 0.00317 < 0.05, we reject the null hypothesis.
Conclusion: At least one group mean is significantly different.
Inferential statistics
Probability Distributions in R
Binomial Distribution
A fair coin (p = 0.5 for heads) is tossed 10 times.
Find the probability of getting exactly 5 heads.
Simulate 10 random outcomes of tossing the coin 10 times.
Here,
n= 10 (number of trials)
K=5 (success)
p=0.5
# Probability of 5 successes in 10 trials with p=0.5
dbinom(5, size=10, prob=0.5)
# Generate random binomial values
rbinom(10, size=10, prob=0.5)
Output Example:
[1] 0.246 # Probability of exactly 5 successes
[1] 6 5 4 7 5 6 5 4 3 7 # Random outcomes
Probability Distributions in R
Conclusion for the Problem
The probability of getting exactly 5 heads in 10 coin tosses is 0.2461 (≈ 24.6%).
◦ This means if we repeat the experiment many times, about 1 in 4 trials will result in exactly 5
heads.
The random simulation using rbinom() shows how the number of heads can vary across
repeated experiments of 10 tosses each.
◦ The values fluctuate around 5, consistent with the theoretical expectation .
Overall: The binomial model confirms that getting exactly 5 successes (heads) in 10 fair coin
tosses is the most likely outcome, but not guaranteed—it occurs about 25% of the time.
Probability Distributions in R
Probability Distributions in R
Poisson Distribution
A call center receives an average of 3 calls per minute.
Find the probability that exactly 5 calls are received in a given minute.
Find the probability that at most 2 calls are received in a given minute.
Simulate the number of calls received in 10 minutes using the Poisson distribution.
# 1. Probability of exactly 5 calls (lambda = 3)
dpois(5, lambda = 3)
# 2. Probability of at most 2 calls
ppois(2, lambda = 3)
# 3. Simulate number of calls in 10 minutes
rpois(10, lambda = 3)
Probability Distributions in R
# 1. Probability of exactly 5 calls
dpois(5, lambda = 3)
[1] 0.1008188
# 2. Probability of at most 2 calls
> ppois(2, lambda = 3)
[1] 0.4231901
# 3. Simulate number of calls in 10 minutes
rpois(10, lambda = 3)
[1] 3 3 1 4 2 2 3 2 0 8
Probability Distributions in R
Conclusion
The probability of getting exactly 5 calls in one minute is ≈ 10%.
The probability of getting at most 2 calls is ≈ 42%.
The simulation shows how the number of calls fluctuates around the average ().
.
Probability Distributions in R
Probability Distributions in R
1. Normal Distribution
The exam scores of students in a class are normally distributed with a mean (μ) = 70 and a
standard deviation (σ) = 10.
Find the probability that a randomly selected student scores less than 80.
Find the probability that a student scores between 60 and 75.
Simulate the exam scores of 10 students using the given distribution.
# 1. Probability of scoring less than 80
pnorm(80, mean = 70, sd = 10)
# 2. Probability of scoring between 60 and 75
pnorm(75, mean = 70, sd = 10) - pnorm(60, mean = 70, sd = 10)
# 3. Simulate exam scores for 10 students
rnorm(10, mean = 70, sd = 10)
Probability Distributions in R
P(X < 80):
pnorm(80, mean = 70, sd = 10)
# [1] 0.8413447
There is about 84.13% chance that a student scores less than 80.
P(60 < X < 75):
pnorm(75, mean = 70, sd = 10) - pnorm(60, mean = 70, sd = 10)
# [1] 0.5328072
There is about 53.28% chance that a student scores between 60 and 75.
Probability Distributions in R
Simulated scores:
rnorm(10, mean = 70, sd = 10)
# Example output: [1] 68.5 72.3 81.0 59.4 74.2 65.8 69.1 77.6 62.8 71.4
These are 10 randomly generated exam scores based on the normal distribution.
Conclusion
Most students are likely to score below 80 (84% probability).
Over half the students (53% probability) will fall between 60 and 75.
The simulated results reflect how scores cluster around the mean (70) with some variation due to
standard deviation.
Probability Distributions in R
Entrepreneurship With R software
Entrepreneurship utilizes the R software for business analytics, enabling data-driven decision-making
through powerful statistical analysis, forecasting, and visualization of trends in customer behavior,
sales, and system performance. Entrepreneurs can leverage R to load, manipulate, and visualize
complex datasets, identify patterns, conduct predictive modeling, and generate actionable insights from
market and operational data to gain a competitive edge.
Business Analytics & Data Mining:
R is an open-source statistical software environment used for high-end graphics and statistical
computations, making it a powerful tool for business analytics and data mining.
Entrepreneurship With R software
Data Visualization:
Entrepreneurs can use packages like ggplot2 to create clear, easy-to-read charts and graphs that
transform raw data into impactful visualizations, helping to identify trends and patterns in data.
Predictive Analytics:
R enables businesses to forecast trends, classify outcomes, and analyze time-dependent data using
regression analysis, classification models, and time series forecasting.
Customer Behavior Analysis:
By analyzing customer data, entrepreneurs can understand needs and preferences, leading to more
tailored products and services.
Sales & Performance Forecasting:
R can be used to forecast sales, model system performance, and predict potential losses,
supporting smarter, data-driven business decisions.
Quantitative Research:
R facilitates the analysis of public and proprietary datasets, helping entrepreneurs conduct
research, test theories, and generate strategic insights.
Market Trend Analysis:
Entrepreneurs can analyze social media trends and market data to refine strategies, improve
customer engagement, and make informed business decisions.
Entrepreneurship With R software
HARNESSING R: STATISTICAL TECHNIQUES FOR DATA-DRIVEN ENTREPRENEURSHIP

HARNESSING R: STATISTICAL TECHNIQUES FOR DATA-DRIVEN ENTREPRENEURSHIP

  • 1.
    HARNESSING R: STATISTICAL TECHNIQUESFOR DATA- DRIVEN ENTREPRENEURSHIP DR. SABERUNNISA. A ASSISTANT PROFESSOR THE MADURA COLLEGE(AUTONOMOUS) MADURAI
  • 2.
    WHAT IS R? Open-sourcestatistical software: Free to use, developed under the GNU license. Programming language & environment: Specifically designed for statistical computing and graphics. Widely used in Data Science, Machine Learning, and Statistics for: Data manipulation & cleaning Statistical modeling Visualization (graphs, charts, plots) Predictive analytics Community-driven: Thousands of user-contributed packages on CRAN.
  • 3.
    History and Backgroundof R Developed by Ross Ihaka and Robert Gentleman in 1993 at the University of Auckland, New Zealand. Based on the S programming language, originally developed at Bell Laboratories. First official release in 1995; Version 1.0.0 was launched in 2000. Supported by the R Foundation for Statistical Computing (established in 2003). Today, R is globally recognized as one of the leading tools for statistical computing, data science, and research.
  • 4.
    Features of R Free and Open Source  Available at no cost, licensed under GNU.  Huge Package Ecosystem (CRAN)  19,000+ packages covering statistics, ML, data visualization, bioinformatics, etc.  Cross-Platform Compatibility  Runs smoothly on Windows, Mac, and Linux systems.  Strong Visualization Libraries  Base R graphics and advanced tools like ggplot2, lattice, plotly.  Highly Extensible  Users can develop custom functions and packages.  Active Community Support  Global network of developers and researchers.
  • 5.
    Installing and UsingR Download from CRAN (Comprehensive R Archive Network) ◦ Official website: https://cran.r-project.org ◦ Choose installer for Windows, Mac, or Linux. RStudio IDE (Integrated Development Environment) ◦ Provides a user-friendly interface for coding in R. ◦ Features: script editor, console, plots, package manager.
  • 6.
  • 7.
    Installing Packages tidyverse –a collection of packages for data wrangling & visualization. install.packages("tidyverse") (includes ggplot2, dplyr, readr, tidyr, tibble, stringr, forcats) data.table – for super-fast data manipulation. install.packages("data.table")
  • 8.
    Installing Packages readxl &openxlsx – to import/export Excel files. install.packages("readxl") install.packages("openxlsx") haven – to read SPSS, SAS, and Stata files. install.packages("haven")
  • 9.
    Statistical Analysis Packages car– regression & ANOVA tools. install.packages("car") MASS – classic stats and datasets. install.packages("MASS") psych – psychological statistics (descriptive, factor analysis). install.packages("psych")
  • 10.
    Data Types inR Basic Data Types: Numeric → Decimal values (e.g., 3.14) Integer → Whole numbers (e.g., 5L) Character → Text values (e.g., "Hello") Logical → Boolean values (TRUE, FALSE)
  • 11.
    Data Structures Vector →Collection of elements of the same type Matrix → 2D array of numbers (rows × columns) Factor → Categorical data (e.g., "Male", "Female") List → Collection of mixed data types Data Frame → Tabular structure (rows × columns), similar to Excel
  • 12.
    Data Import Read thedata from Excel > library(readxl) > data <- read_excel("C:/Users/D E L L/OneDrive/Desktop/data.xlsx") > View(data) READ CSV FILE >DATA<-read.csv("C:/Users/D E L L/OneDrive/Desktop/data/data - Copy.csv") > View(DATA)
  • 13.
    Basic R Commands Arithmetic Operations x<- 10 y <- 3 x + y # Addition x - y # Subtraction x * y # Multiplication x / y # Division x ^ y # Power Creating Vectors(c()) v <- c(2, 4, 6, 8, 10) print(v) Indexing and Subsetting v[1] # First element v[2:4] # Elements 2 to 4 v[v > 5] # Elements greater than 5
  • 14.
    Data Visualization inR 1. Base R Plots Simple and built-in plotting system x<-c(1,2,3,4,5) y<-c(2,4,6,8,10) plot(x, y, type=“o", col="blue", main="Base R Plot")
  • 15.
  • 16.
    2. Advanced Graphicswith ggplot2 library(ggplot2) data <- data.frame(Name=c("A","B","C"), Score=c(85,90,78)) ggplot(data, aes(x=Name, y=Score)) + geom_bar(stat="identity", fill="skyblue") + ggtitle("Bar Chart ")
  • 17.
  • 18.
    Scatter Plot library(ggplot2) ggplot(mtcars, aes(x=wt,y=mpg)) + geom_point(color="red") + ggtitle("Scatter Plot: mpg vs wt")
  • 19.
  • 20.
    Introduction to StatisticalAnalysis Role of Statistics in Data Analysis ◦ Helps in collecting, organizing, analyzing, and interpreting data. ◦ Converts raw data into meaningful insights. ◦ Essential for decision-making, prediction, and research validation.
  • 21.
    Types of StatisticalAnalysis Descriptive Statistics ◦ Summarizes data. ◦ Measures: mean, median, mode, variance, standard deviation. ◦ Example: Average exam score of a class. Inferential Statistics ◦ Draws conclusions about a population from a sample. ◦ Techniques: hypothesis testing, confidence intervals, regression analysis. ◦ Example: Predicting election results from a survey.
  • 22.
    Descriptive Statistics Descriptive Statisticsin R 1. Summary of Data data <- c(85, 90, 78, 92, 88) summary(data) Output: Min. 1st Qu. Median Mean 3rd Qu. Max. 78 85 88 86.6 90 92
  • 23.
    Descriptive Statistics 2. Measuresof Central Tendency mean(data) # Mean median(data) # Median Output: [1] 86.6 [1] 88
  • 24.
    Descriptive Statistics 3. Measuresof Spread (Dispersion) var(data) # Variance sd(data) # Standard Deviation range(data) # Min & Max Output: [1] 34.3 [1] 5.86 [1] 78 92
  • 25.
    Descriptive Statistics 4. FrequencyTable table(data) Output: data 78 85 88 90 92 1 1 1 1 1
  • 26.
    Inferential statistics A manufacturerclaims that the average fuel efficiency of their new car model is 25 mpg. A random sample of 10 cars gave the following mileages: 23, 25, 27, 24, 26, 22, 28, 25, 24, 23 Test at 5% significance level whether the claim is true using a one-sample t-test. Step 1: State Hypotheses Null Hypothesis (H ): ₀ μ = 25 (average mileage = 25 mpg) Alternative Hypothesis (H ): ₁ μ ≠ 25 (average mileage is different from 25 mpg) This is a two-tailed test. .
  • 27.
    Inferential statistics #Sample data mileage<- c(23, 25, 27, 24, 26, 22, 28, 25, 24, 23) # Perform one-sample t-test t.test(mileage, mu = 25)
  • 28.
    Inferential statistics One Samplet-test data: mileage t = -0.50233, df = 9, p-value = 0.6275 alternative hypothesis: true mean is not equal to 25 95 percent confidence interval: 23.349 26.051 sample estimates: mean of x : 24.7
  • 29.
    Inferential statistics Step 4:Inference Test statistic t = -0.50233, p-value = 0.6275 Since p > 0.05, we fail to reject H₀. Conclusion: At 5% significance level, there is no significant evidence to say that the mean mileage is different from 25 mpg. The manufacturer’s claim is reasonable.
  • 30.
    Inferential statistics Chi-Square Test Adie is suspected to be biased. To test this, it is rolled 60 times, and the observed frequencies of outcomes are: Test at the 5% significance level whether the die is fair using the Chi-square goodness of fit test. Face 1 2 3 4 5 6 Observed (O) 8 9 10 12 11 10
  • 31.
    •To find thegoodness of fit # Observed frequencies observed <- c(8, 9, 10, 12, 11, 10) # Expected frequencies expected_prob <- rep(1/6, 6) # probabilities for 6 faces # Chi-square Goodness of Fit Test test <- chisq.test(x = observed, p = expected_prob) print(test) Inferential statistics
  • 32.
    Inferential statistics Chi-squared testfor given probabilities data: observed X-squared = 1, df = 5, p-value = 0.9626 Here , p-value = 0.9626 >0.05 . Therefore fail to reject null hypothesis. Conclusion: At 5% significance level, there is no significant evidence to say that the die is unfair. The die appears fair.
  • 33.
    Inferential statistics A researcherwants to test whether the mean test scores differ among three different teaching methods. The scores of students are recorded as follows: Group A (Method 1): 85, 90, 88 Group B (Method 2): 70, 75, 80 Group C (Method 3): 95, 92, 89 Is there a significant difference in mean scores among the three teaching methods at the 5% significance level?
  • 34.
    Step 1: Hypotheses NullHypothesis (H ): ₀ μ = μ = μ (all groups have equal mean scores). ₁ ₂ ₃ Alternative Hypothesis (H ): ₁ At least one group mean differs. ANOVA (Analysis of Variance) •Compare means across 3+ groups df <- data.frame( score = c(85,90,88,70,75,80,95,92,89), group = rep(c("A","B","C"), each=3)) aov_res <- aov(score ~ group, data=df) summary(aov_res) Inferential statistics
  • 35.
  • 36.
    This tells ushow much larger the between-group variance is compared to the within-group variance. The probability of getting an F-value as large as 17.41 (or larger) if H is true ₀ . Since p = 0.00317 < 0.05, we reject the null hypothesis. Conclusion: At least one group mean is significantly different. Inferential statistics
  • 37.
    Probability Distributions inR Binomial Distribution A fair coin (p = 0.5 for heads) is tossed 10 times. Find the probability of getting exactly 5 heads. Simulate 10 random outcomes of tossing the coin 10 times. Here, n= 10 (number of trials) K=5 (success) p=0.5
  • 38.
    # Probability of5 successes in 10 trials with p=0.5 dbinom(5, size=10, prob=0.5) # Generate random binomial values rbinom(10, size=10, prob=0.5) Output Example: [1] 0.246 # Probability of exactly 5 successes [1] 6 5 4 7 5 6 5 4 3 7 # Random outcomes Probability Distributions in R
  • 39.
    Conclusion for theProblem The probability of getting exactly 5 heads in 10 coin tosses is 0.2461 (≈ 24.6%). ◦ This means if we repeat the experiment many times, about 1 in 4 trials will result in exactly 5 heads. The random simulation using rbinom() shows how the number of heads can vary across repeated experiments of 10 tosses each. ◦ The values fluctuate around 5, consistent with the theoretical expectation . Overall: The binomial model confirms that getting exactly 5 successes (heads) in 10 fair coin tosses is the most likely outcome, but not guaranteed—it occurs about 25% of the time. Probability Distributions in R
  • 40.
    Probability Distributions inR Poisson Distribution A call center receives an average of 3 calls per minute. Find the probability that exactly 5 calls are received in a given minute. Find the probability that at most 2 calls are received in a given minute. Simulate the number of calls received in 10 minutes using the Poisson distribution.
  • 41.
    # 1. Probabilityof exactly 5 calls (lambda = 3) dpois(5, lambda = 3) # 2. Probability of at most 2 calls ppois(2, lambda = 3) # 3. Simulate number of calls in 10 minutes rpois(10, lambda = 3) Probability Distributions in R
  • 42.
    # 1. Probabilityof exactly 5 calls dpois(5, lambda = 3) [1] 0.1008188 # 2. Probability of at most 2 calls > ppois(2, lambda = 3) [1] 0.4231901 # 3. Simulate number of calls in 10 minutes rpois(10, lambda = 3) [1] 3 3 1 4 2 2 3 2 0 8 Probability Distributions in R
  • 43.
    Conclusion The probability ofgetting exactly 5 calls in one minute is ≈ 10%. The probability of getting at most 2 calls is ≈ 42%. The simulation shows how the number of calls fluctuates around the average (). . Probability Distributions in R
  • 44.
    Probability Distributions inR 1. Normal Distribution The exam scores of students in a class are normally distributed with a mean (μ) = 70 and a standard deviation (σ) = 10. Find the probability that a randomly selected student scores less than 80. Find the probability that a student scores between 60 and 75. Simulate the exam scores of 10 students using the given distribution.
  • 45.
    # 1. Probabilityof scoring less than 80 pnorm(80, mean = 70, sd = 10) # 2. Probability of scoring between 60 and 75 pnorm(75, mean = 70, sd = 10) - pnorm(60, mean = 70, sd = 10) # 3. Simulate exam scores for 10 students rnorm(10, mean = 70, sd = 10) Probability Distributions in R
  • 46.
    P(X < 80): pnorm(80,mean = 70, sd = 10) # [1] 0.8413447 There is about 84.13% chance that a student scores less than 80. P(60 < X < 75): pnorm(75, mean = 70, sd = 10) - pnorm(60, mean = 70, sd = 10) # [1] 0.5328072 There is about 53.28% chance that a student scores between 60 and 75. Probability Distributions in R
  • 47.
    Simulated scores: rnorm(10, mean= 70, sd = 10) # Example output: [1] 68.5 72.3 81.0 59.4 74.2 65.8 69.1 77.6 62.8 71.4 These are 10 randomly generated exam scores based on the normal distribution. Conclusion Most students are likely to score below 80 (84% probability). Over half the students (53% probability) will fall between 60 and 75. The simulated results reflect how scores cluster around the mean (70) with some variation due to standard deviation. Probability Distributions in R
  • 48.
    Entrepreneurship With Rsoftware Entrepreneurship utilizes the R software for business analytics, enabling data-driven decision-making through powerful statistical analysis, forecasting, and visualization of trends in customer behavior, sales, and system performance. Entrepreneurs can leverage R to load, manipulate, and visualize complex datasets, identify patterns, conduct predictive modeling, and generate actionable insights from market and operational data to gain a competitive edge. Business Analytics & Data Mining: R is an open-source statistical software environment used for high-end graphics and statistical computations, making it a powerful tool for business analytics and data mining.
  • 49.
    Entrepreneurship With Rsoftware Data Visualization: Entrepreneurs can use packages like ggplot2 to create clear, easy-to-read charts and graphs that transform raw data into impactful visualizations, helping to identify trends and patterns in data. Predictive Analytics: R enables businesses to forecast trends, classify outcomes, and analyze time-dependent data using regression analysis, classification models, and time series forecasting. Customer Behavior Analysis: By analyzing customer data, entrepreneurs can understand needs and preferences, leading to more tailored products and services.
  • 50.
    Sales & PerformanceForecasting: R can be used to forecast sales, model system performance, and predict potential losses, supporting smarter, data-driven business decisions. Quantitative Research: R facilitates the analysis of public and proprietary datasets, helping entrepreneurs conduct research, test theories, and generate strategic insights. Market Trend Analysis: Entrepreneurs can analyze social media trends and market data to refine strategies, improve customer engagement, and make informed business decisions. Entrepreneurship With R software