1 Logistic Regression 简述

Linear Regression 研究连续量的变化情况,而Logistic Regression则研究离散量的情况。简单地说就是对于推断一个训练样本是属于1还是0。那么非常easy地我们会想到概率,对,就是我们计算样本属于1的概率及属于0的概率,这样就能够依据概率来预计样本的情况,通过概率也将离散问题变成了连续问题。



Specifically, we will try to learn a function of the form:


P(y=1|x)P(y=0|x)=hθ(x)=11+exp(−θ⊤x)≡σ(θ⊤x),=1−P(y=1|x)=1−hθ(x).

The function σ(z)≡11+exp(−z) is often called the “sigmoid” or “logistic” function

我们仅仅须要计算y=1的概率就ok了。其Cost Function例如以下:

J(θ)=−∑i(y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))).

除了方程不一样,其它的计算和Linear Regression是全然一样的。

OK,接下来我们来看看练习怎么做。


2 exercise1B 解答

本练习通过使用MNIST的数据来推断手写数字0或者1.


我直接贴出代码:

ex1b_regression.m (无需更改)

addpath ../common
addpath ../common/minFunc_2012/minFunc
addpath ../common/minFunc_2012/minFunc/compiled

% Load the MNIST data for this exercise.
% train.X and test.X will contain the training and testing images.
% Each matrix has size [n,m] where:
% m is the number of examples.
% n is the number of pixels in each image.
% train.y and test.y will contain the corresponding labels (0 or 1).
binary_digits = true;
[train,test] = ex1_load_mnist(binary_digits);

% Add row of 1s to the dataset to act as an intercept term.
train.X = [ones(1,size(train.X,2)); train.X];
test.X = [ones(1,size(test.X,2)); test.X];

% Training set dimensions
m=size(train.X,2);
n=size(train.X,1);

% Train logistic regression classifier using minFunc
options = struct('MaxIter', 100);

% First, we initialize theta to some small random values.
theta = rand(n,1)*0.001;

% Call minFunc with the logistic_regression.m file as the objective function.
%
% TODO: Implement batch logistic regression in the logistic_regression.m file!
%
%tic;
%theta=minFunc(@logistic_regression, theta, options, train.X, train.y);
%fprintf('Optimization took %f seconds.\n', toc);

% Now, call minFunc again with logistic_regression_vec.m as objective.
%
% TODO: Implement batch logistic regression in logistic_regression_vec.m using
% MATLAB's vectorization features to speed up your code. Compare the running
% time for your logistic_regression.m and logistic_regression_vec.m implementations.
%
% Uncomment the lines below to run your vectorized code.
%theta = rand(n,1)*0.001;
tic;
theta=minFunc(@logistic_regression_vec, theta, options, train.X, train.y);
fprintf('Optimization took %f seconds.\n', toc);

% Print out training accuracy.
tic;
accuracy = binary_classifier_accuracy(theta,train.X,train.y);
fprintf('Training accuracy: %2.1f%%\n', 100*accuracy);

% Print out accuracy on the test set.
accuracy = binary_classifier_accuracy(theta,test.X,test.y);
fprintf('Test accuracy: %2.1f%%\n', 100*accuracy);


logistic_regression.m


function [f,g] = logistic_regression(theta, X,y)
%
% Arguments:
% theta - A column vector containing the parameter values to optimize.
% X - The examples stored in a matrix.
% X(i,j) is the i'th coordinate of the j'th example.
% y - The label for each example. y(j) is the j'th example's label.
%

m=size(X,2);
n=size(X,1);

% initialize objective value and gradient.
f = 0;
g = zeros(size(theta));


%
% TODO: Compute the objective function by looping over the dataset and summing
% up the objective values for each example. Store the result in 'f'.
%
% TODO: Compute the gradient of the objective by looping over the dataset and summing
% up the gradients (df/dtheta) for each example. Store the result in 'g'.
%
%%% YOUR CODE HERE %%%

% Step 1?Compute Cost Function

for i = 1:m
f = f - (y(i)*log(sigmoid(theta' * X(:,i))) + (1-y(i))*log(1-...
sigmoid(theta' * X(:,1))));
end


for j = 1:n
for i = 1:m
g(j) = g(j) + X(j,i)*(sigmoid(theta' * X(:,i)) - y(i));
end

end


ex1_load_mnist.m (无需更改)


function [train, test] = ex1_load_mnist(binary_digits)

% Load the training data
X=loadMNISTImages('train-images-idx3-ubyte'); % 784x60000 60000张图片28x28pixel
y=loadMNISTLabels('train-labels-idx1-ubyte')'; % 1*60000

if (binary_digits)
% Take only the 0 and 1 digits
X = [ X(:,y==0), X(:,y==1) ]; %通过y==0和y==1直接得到y=0和1的index
y = [ y(y==0), y(y==1) ];
end

% Randomly shuffle the data
I = randperm(length(y));
y=y(I); % labels in range 1 to 10
X=X(:,I);

% We standardize the data so that each pixel will have roughly zero mean and unit variance.
s=std(X,[],2); %??std??X???
m=mean(X,2);
X=bsxfun(@minus, X, m);
X=bsxfun(@rdivide, X, s+.1); % 就是计算(x-m)/s 加0.1是为了防止分母为0

% Place these in the training set
train.X = X;
train.y = y;

% Load the testing data
X=loadMNISTImages('t10k-images-idx3-ubyte');
y=loadMNISTLabels('t10k-labels-idx1-ubyte')';

if (binary_digits)
% Take only the 0 and 1 digits
X = [ X(:,y==0), X(:,y==1) ];
y = [ y(y==0), y(y==1) ];
end

% Randomly shuffle the data
I = randperm(length(y));
y=y(I); % labels in range 1 to 10
X=X(:,I);

% Standardize using the same mean and scale as the training data.
X=bsxfun(@minus, X, m);
X=bsxfun(@rdivide, X, s+.1);

% Place these in the testing set
test.X=X;
test.y=y;