1-1. Introduction What is Machine Learning
machine learning이 정확히 뭔지 well accepted definition이 없다.
● Machine Learning definition
⊙ Author Samuel(1959) - Informal, order definition
-Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.
체커스를 그 자신은 잘 못하지만 컴퓨터가 수많은 게임을 하면서 학습해서 더 잘하게 되었다.
⊙ Tom Mitchell(1988) - 최신 definition
-Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
-experience E: 프로그램이 tens of thousands of 체커 play하는 것
-task P: Playing 체커스
-performance measure P: 다음 게임에서 이길 확률
⊙ Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting?
-Classifying emails as spam or not spam. (O)
-Watching you label emails as spam or not spam.
-The number(or fraction) of emails correctly classified as spam/not spam.
-None of the above-this is not a machine learning problem.
-Experience E: label emails as spam or not spam.
-Task T: classifiying emails as spam or not spam.
-Performance measure P: correctly classified as spam/not spam.
E를 계속 하면서 P가 점점 높아질 것이다.
● Machine Learning Algorithm
- Supervised learning
- Unsupervised learning
-Others: Reinforcement learning, recommender systems.
(하지만 이것보다는 위의 두 개가 더 많이 쓰인다.)
-Also talk about: Practical advice for applying learning algorithms.
1-2. Introduction Supervised Learning
Most common type of machine learning
● Housing price prediction (regression)
● Breast cancer (malignant, benign)
● Supervised Learning
-“right answers” given
what is right, actual price
-Regression: Predict continuous valued output(price)
-Classification: Discrete valued output (0 or 1)
can be more than 2 values. (0: benign, 1,2,3:각자 다른 type의 answer)
discrete value이기 때문에 classification problem이 맞다.
여기서는 두 가지 feature가 있는 것 (age, tumor size)
하지만 나중에는 두 가지 이상의 feature가 있을 수도 있다.
clump thickness, uniformity of cell size, uniformity of cell shape 등
* You’re running a company, and you want to develop learning algorithms to address each of two problems.
-Problem 1: You have a large inventory of identical items. You want to predict how many of these items will sell over the next 3 months.
-Problem 2: You’d like software to examine individual customer accounts, and for each account decide if it has been hacked/comprised.
Should you treat these as classification or as regression problems?
-Treat both as classification problems.
-Treat problem 1 as a classification problem, problem 2 as a regression problem.
-Treat problem 1 as a regression problem, problem 2 as a classification problem. (O)
-Treat both as regression problems.
1-3. Introduction Unsupervised Learning
● Unsupervised Learning
-Clustering algorithm
관련 있는 것들을 묶어서 처리
● Cocktail party problem
Cocktail party problem
칵테일 파티에서 여러 사람이 동시에 이야기를 하면 누가 무엇을 이야기했는지 제대로 구별해내기 힘들다.
⊙ Of the following examples, which would you address using an unsupervised learning algorithm? (Check all that apply.)
-Given email labeled as spam / not spam, learn a spam filter. (supervised learning)
-Given a set of news articles found on the web, group them into set of articles about the same story. (unsupervised learning-clustering)
-Given a database of customer data, automatically discover market segments and group customers into different market segments. (unsupervised learning)
-Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as having diabetes or not. (supervised learning)