-
2017年第二版《精通scikit-learn机器学习》
资源介绍
Mastering Machine Learning with scikit-learn (2 ed)
(True PDF + AWZ3 + codes)
Table of Contents
Preface 1
Chapter 1: The Fundamentals of Machine Learning 6
Defining machine learning 6
Learning from experience 8
Machine learning tasks 9
Training data, testing data, and validation data 10
Bias and variance 13
An introduction to scikit-learn 15
Installing scikit-learn 16
Installing using pip 17
Installing on Windows 17
Installing on Ubuntu 16.04 17
Installing on Mac OS 17
Installing Anaconda 18
Verifying the installation 18
Installing pandas, Pillow, NLTK, and matplotlib 18
Summary 19
Chapter 2: Simple Linear Regression 20
Simple linear regression 20
Evaluating the fitness of the model with a cost function 25
Solving OLS for simple linear regression 27
Evaluating the model 29
Summary 31
Chapter 3: Classification and Regression with k-Nearest Neighbors 32
K-Nearest Neighbors 32
Lazy learning and non-parametric models 33
Classification with KNN 34
Regression with KNN 42
Scaling features 44
Summary 47
Chapter 4: Feature Extraction 48
Extracting features from categorical variables 48
Standardizing features 49
[ ii ]
Extracting features from text 50
The bag-of-words model 50
Stop word filtering 53
Stemming and lemmatization 54
Extending bag-of-words with tf-idf weights 57
Space-efficient feature vectorizing with the hashing trick 59
Word embeddings 61
Extracting features from images 64
Extracting features from pixel intensities 65
Using convolutional neural network activations as features 66
Summary 68
Chapter 5: From Simple Linear Regression to Multiple Linear
Regression 70
Multiple linear regression 70
Polynomial regression 74
Regularization 79
Applying linear regression 80
Exploring the data 81
Fitting and evaluating the model 84
Gradient descent 86
Summary 90
Chapter 6: From Linear Regression to Logistic Regression 91
Binary classification with logistic regression 92
Spam filtering 94
Binary classification performance metrics 95
Accuracy 97
Precision and recall 98
Calculating the F1 measure 99
ROC AUC 100
Tuning models with grid search 102
Multi-class classification 104
Multi-class classification performance metrics 107
Multi-label classification and problem transformation 108
Multi-label classification performance metrics 113
Summary 114
Chapter 7: Naive Bayes 115
Bayes' theorem 115
Generative and discriminative models 117
[ iii ]
Naive Bayes 118
Assumptions of Naive Bayes 119
Naive Bayes with scikit-learn 120
Summary 124
Chapter 8: Nonlinear Classification and Regression with Decision
Trees 125
Decision trees 125
Training decision trees 127
Selecting the questions 128
Information gain 131
Gini impurity 136
Decision trees with scikit-learn 137
Advantages and disadvantages of decision trees 139
Summary 140
Chapter 9: From Decision Trees to Random Forests and Other
Ensemble Methods 141
Bagging 141
Boosting 144
Stacking 146
Summary 148
Chapter 10: The Perceptron 149
The perceptron 149
Activation functions 150
The perceptron learning algorithm 152
Binary classification with the perceptron 153
Document classification with the perceptron 161
Limitations of the perceptron 162
Summary 163
Chapter 11: From the Perceptron to Support Vector Machines 164
Kernels and the kernel trick 165
Maximum margin classification and support vectors 169
Classifying characters in scikit-learn 172
Classifying handwritten digits 172
Classifying characters in natural images 175
Summary 177
Chapter 12: From the Perceptron to Artificial Neural Networks 178
Nonlinear decision boundaries 179
[ iv ]
Feed-forward and feedback ANNs 180
Multi-layer perceptrons 181
Training multi-layer perceptrons 183
Backpropagation 184
Training a multi-layer perceptron to approximate XOR 189
Training a multi-layer perceptron to classify handwritten digits 192
Summary 193
Chapter 13: K-means 194
Clustering 194
K-means 197
Local optima 203
Selecting K with the elbow method 204
Evaluating clusters 207
Image quantization 209
Clustering to learn features 211
Summary 214
Chapter 14: Dimensionality Reduction with Principal Component
Analysis 215
Principal component analysis 215
Variance, covariance, and covariance matrices 220
Eigenvectors and eigenvalues 222
Performing PCA 224
Visualizing high-dimensional data with PCA 227
Face recognition with PCA 228
Summary 231
Index 233