历史记录

清除记录

猜你想搜

AcWing热点
App
登录/注册

机器学习笔记（未完善）

作者：

AuroraYxh , 2021-08-07 13:35:08 , 所有人可见 , 阅读 321

2

1

1. Introduction

1.1 Welcome

1.2 What is Machine Learning

Machine Learning definition

Tom Mitchell: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

1.3 Supervised Learning

（We’re given what is the so-called right answer）

regression problems and classification problems

1.4 Unsupervised Learning

（We’re given data that doesn’t have any labels or that have the same label or really no labels. Given this data set, an unsupervised learning algorithm might decide that the data lives in two different clusters. And the unsupervised learning algorithms may break these data into two separate clusters. So this is called a clustering algorithm. ）

SVD functions (Singular Value Decomposition)奇异值分解

2. Linear regression with one variable

2.1 Model representation

Linear regression with one variable（univariate linear regression）

Notation：

m = Number of training examples

x‘s = “input” variable / features

y‘s = “output” variable / “target” variable

(x,y) = one training example

($x^i,y^i$)=$i^{th}$ training example

2.2 Cost function

Hypothesis：$h_\theta x=\theta_0+\theta_1$

Parameters: $\theta_0,\theta_1$

Cost function:(square error cost function):$J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^i)-y^i)^2$(the most commonly used one for regression problems)

Goal: minimize $J(\theta_0,\theta_1)$

2.3 Cost function intuition(1)

2.4 Cost function intuition(2)

2.5 Gradient descent

Gradient descent algorithm

repeat until convergence{

$\theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1)$ (for $j=0$ and $j=1$)

$\alpha:$learning rate

simultaneously update $\theta_0$ and $\theta_1 $

}

Correct: Simultaneously update

temp0$:=\theta_0-\alpha\frac{\partial}{\partial\theta_0}J(\theta_0,\theta_1)$

temp1$:=\theta_1-\alpha\frac{\partial}{\partial\theta_1}J(\theta_0,\theta_1)$

$\theta_0:=$temp0

$\theta_1:=$temp1

2.6 Gradient descent intuition

$\theta_1:=\theta_1-\alpha\frac{\mathrm{d}}{\mathrm{d}\theta_1}J(\theta_1)$

Gradient descent can converge to a local minimum, even with the learning rate $\alpha$ fixed

As we approach a local minimum, gradient descent will automatically take smaller steps. So no need to decrease $\alpha$ over time.

2.7 Gradient descent for linear regression

Gradient descent algorithm

repeat until convergence {

$\theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1)$

(for $j=0$ and $j=1$)

}

Linear Regression Model

$h_\theta x=\theta_0+\theta_1$

$J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^i)-y^i)^2$

“Batch” Gradient Descent

“Batch”: Each step of gradient descent uses all the training examples.

3. Linear Algebra review (optional)

3.1 Matrices and vectors

An vector is a matrix that has only 1 column, so you haveS an nx1 matrix

$y^i=i^{th}$ element

Math: 1-indexed vectors

Machine learning: 0-indexed vectors

(By convention, most people will use upper case to refer to matrices and usually we’ll use lower case to refer to either numbers, or just raw numbers or scalars or to vectors.)