From the course: Artificial Intelligence Foundations: Thinking Machines
Regression
- Up until now you've seen how supervised learning can be used for classification. You can use your neural network to classify music into different categories. They use human-created categories to classify your music into jazz, blues, or even country. Remember that this is different from unsupervised learning. In unsupervised learning, you have the artificial neural network create its own categories. These machines create categories that may or may not be the same as their human counterparts. The marching band could put country music and blues into the same category. Classification isn't the only form of supervised machine learning. You can also have your artificial neural network use something called regression. Regression analysis is when you look at the relationship between a dependent variable and many other independent variables. When I was younger, I used to have an apartment that overlooked Lake Michigan. In the summer, I could use regression analysis to see how many people would be sitting on the beach. On the hot days, the number of people on the beach would increase. In fact, the hotter it got, the more people you would see trying to cool off in the large blue lake. In this case, the number of people on the beach was the dependent variable. It would be higher or lower based on several independent variables. These independent variables could be the outside temperature, rain or even the day of the week. After a few weeks of observing the behavior, I could accurately predict how many people would be on the beach that day. In fact, that's why a lot of organizations like regression analysis. If you understand the relationship between the dependent and independent variables, you can often do a pretty good job of predicting people's behavior. I once worked for a credit card processing organization that was trying to look for warning signs for when their customer will have trouble paying their bill. They used a regression in their artificial neural network to try to find relationships between different variables. The dependent variable was the likelihood of them not paying their bill. The independent variables was the different items or amounts they charged on their credit card before they were billed for their purchase. What they found is that many costumers start to put essentials on their credit card just before they have trouble paying their bill. So a customer who only uses their credit card for large purchases, such as a television or a computer, would suddenly start putting essential purchases on them, like groceries and gasoline. Their artificial neural network found that there was a relationship between these different variables. If you increase the purchase of groceries, gasoline, and utility bills, then the likelihood of late payments started to increase. Even though the information you get from regression analysis is much different, the way you use your artificial neural network is pretty much the same. It still requires massive amounts of data to find patterns. Then the network looks for imperceptible patterns in large data sets. In one, you're using the data to classify. And in the other, you're using the data to find relationships. When you're starting out your own AI project, it's important to determine whether you want to classify or use regression analysis. Think of it as a difference between sorting and connecting. If your project is looking for patterns to sort your data into different categories, then you're going to want to stick with classification. This is when your artificial neural network can group together pictures of cats or identify different types of music. If your project is looking for many different patterns to connect, many different variables, then you're going to want to try regression. It's here where your artificial neural network processes what happens between the dependent and several independent variables. Then it identifies patterns, so you can predict what happens. One thing to keep in mind is that your artificial neural network will only show you the patterns. It doesn't necessarily provide the answers. The credit card company also found that people who had a lot of purchases under $5 were likely to have trouble paying their bill. The network didn't really give anyone any idea for why this occurred. It just pointed out that it was happening. After they found the pattern, it was up to human beings to try to sort out why this happens. Perhaps at some point, these artificial neural networks will be able to create their own theories for why these patterns exist. For now, they pretty much rely on their human counterparts to find meaning in these connections.