Lecture 1

1-1. Introduction What is Machine Learning

machine learning이 정확히 뭔지 well accepted definition이 없다.

● Machine Learning definition

⊙ Author Samuel(1959) - Informal, order definition

-Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.

체커스를 그 자신은 잘 못하지만 컴퓨터가 수많은 게임을 하면서 학습해서 더 잘하게 되었다.

⊙ Tom Mitchell(1988) - 최신 definition

-Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

-experience E: 프로그램이 tens of thousands of 체커 play하는 것

-task P: Playing 체커스

-performance measure P: 다음 게임에서 이길 확률

⊙ Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting?

-Classifying emails as spam or not spam. (O)

-Watching you label emails as spam or not spam.

-The number(or fraction) of emails correctly classified as spam/not spam.

-None of the above-this is not a machine learning problem.

-Experience E: label emails as spam or not spam.

-Task T: classifiying emails as spam or not spam.

-Performance measure P: correctly classified as spam/not spam.

E를 계속 하면서 P가 점점 높아질 것이다.

● Machine Learning Algorithm

- Supervised learning

- Unsupervised learning

-Others: Reinforcement learning, recommender systems.

(하지만 이것보다는 위의 두 개가 더 많이 쓰인다.)

-Also talk about: Practical advice for applying learning algorithms.

1-2. Introduction Supervised Learning

Most common type of machine learning

● Housing price prediction (regression)

● Breast cancer (malignant, benign)

● Supervised Learning

-“right answers” given

what is right, actual price

-Regression: Predict continuous valued output(price)

-Classification: Discrete valued output (0 or 1)

can be more than 2 values. (0: benign, 1,2,3:각자 다른 type의 answer)

discrete value이기 때문에 classification problem이 맞다.

여기서는 두 가지 feature가 있는 것 (age, tumor size)

하지만 나중에는 두 가지 이상의 feature가 있을 수도 있다.

clump thickness, uniformity of cell size, uniformity of cell shape 등

* You’re running a company, and you want to develop learning algorithms to address each of two problems.

-Problem 1: You have a large inventory of identical items. You want to predict how many of these items will sell over the next 3 months.

-Problem 2: You’d like software to examine individual customer accounts, and for each account decide if it has been hacked/comprised.

Should you treat these as classification or as regression problems?

-Treat both as classification problems.

-Treat problem 1 as a classification problem, problem 2 as a regression problem.

-Treat problem 1 as a regression problem, problem 2 as a classification problem. (O)

-Treat both as regression problems.

1-3. Introduction Unsupervised Learning

● Unsupervised Learning

-Clustering algorithm

관련 있는 것들을 묶어서 처리

● Cocktail party problem

Cocktail party problem

칵테일 파티에서 여러 사람이 동시에 이야기를 하면 누가 무엇을 이야기했는지 제대로 구별해내기 힘들다.

⊙ Of the following examples, which would you address using an unsupervised learning algorithm? (Check all that apply.)

-Given email labeled as spam / not spam, learn a spam filter. (supervised learning)

-Given a set of news articles found on the web, group them into set of articles about the same story. (unsupervised learning-clustering)

-Given a database of customer data, automatically discover market segments and group customers into different market segments. (unsupervised learning)

-Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as having diabetes or not. (supervised learning)

Rookie's Programming

Lecture 1

티스토리툴바