This is an exercise from DataQuest.
I guess I'm trying to iterate over an array but it won't let me. How is an array different from a list?
32561 is the sample size, and 16280.50 is 50% male and 50% female.
import numpy as np
import matplotlib.pyplot as plt
chi_squared_values = []
for i in range(1000):
random_n = np.random.random((32561,))
for array in random_n:
male_count = 0
female_count = 0
for n in array: # Error on this line
if n < 0.5:
male_count =+ 1
else:
female_count =+ 1
male_diff = (male_count - 16280.5) ** 2 / 16280.5
female_diff = (female_count - 16280.5) ** 2 / 16280.5
chi_squared_value = male_diff + female_diff
chi_squared_values.append(chi_squared_value)
plt.hist(chi_squared_values)
plt.show()
# Output: TypeError: 'numpy.float64' object is not iterable
The correct answer for reference is:
chi_squared_values = []
from numpy.random import random
import matplotlib.pyplot as plt
for i in range(1000):
sequence = random((32561,))
sequence[sequence < .5] = 0
sequence[sequence >= .5] = 1
male_count = len(sequence[sequence == 0])
female_count = len(sequence[sequence == 1])
male_diff = (male_count - 16280.5) ** 2 / 16280.5
female_diff = (female_count - 16280.5) ** 2 / 16280.5
chi_squared = male_diff + female_diff
chi_squared_values.append(chi_squared)
plt.hist(chi_squared_values)
random_nwould give you arrays?