From the course: Learning Data Science

Understanding probability

- Probability is another area in statistics, which allows you to tell interesting stories. Probability is a measurement of the possible outcomes. If you flip a coin, probability predicts whether it will come up one side or the other. Probability statistics primarily deals with probability distribution. If you throw a six-sided die, that means that there are six possible outcomes. That means that the possibility of any number coming up is one in six. That means each time you throw a die, you have about a 17% chance of hitting a particular number. Probability can also show a sequence of events. How can I show the probability of hitting the same number twice in a row? Well, that's 17% of 17% or roughly 3%. If you're playing the game, that's a low probability. Your data science team will certainly want to work with probability. It's a key part of predictive analytics. It'll help you figure out the likelihood that your customer will do one thing over the other. I work with a biotech company that aimed to predict whether someone would join a clinical trial. Getting people to participate in clinical trials is difficult to predict. It turns out that there are a few things that might decrease the possibility of someone participating. If you can't eat the night before, then they might be 30% likely to participate. They also might be 20% less likely to participate if there are blood tests and needles. They had to balance out the probability of people participating against the accuracy of the results. In a drug trial, they could test the effectiveness with either a saliva or a blood test. The blood test was 10% more likely to be accurate. That was easy. They should just use the blood test. But hold on. If they run the trial with a blood test, then they'll have 20% fewer participants, which would decrease the amount of data points for the study. They'd lose the people who decided against the study because they were afraid of needles. If they want 1,000 participants, that would mean about 200 fewer people. The data science team had to take that into account. Was it better to have more people in the study without needles even though it was less accurate? Or might it be best to include more participants in the trial to better catch reactions? Then taking the less accurate saliva test might increase the probability of having a better result. There are a few things to keep in mind when you're working with probability. The first is that probability will lead you to some unexpected places. Who would've thought that a medical practice might be get better results by administering a less accurate test? The second is that probability can also be a great vehicle for asking more interesting questions. Don't be discouraged if your questions just lead to more questions. Remember that data science is applying the scientific method to your data. Sometimes that path will lead you to an unexpected place. The important thing is not to jump off when the path takes a strange turn. Those strange turns are often the path to your greatest insights.

Contents