Preface | p. xv |
Introduction to Probability and Statistics | p. 1 |
What Is a Probability? | p. 1 |
Calculating the Expected Value | p. 2 |
Random Variables | p. 3 |
Discrete versus Continuous Random Variables | p. 4 |
Well-Known Probability Distributions | p. 4 |
Fundamental Concepts in Statistics | p. 5 |
The Mean | p. 5 |
The Median | p. 5 |
The Mode | p. 5 |
The Variance and Standard Deviation | p. 6 |
Population, Sample, and Population Variance | p. 6 |
Chebyshev's Inequality | p. 7 |
What Is a P-Value? | p. 7 |
The Moments of a Function (Optional) | p. 7 |
What Is Skewness? | p. 8 |
What Is Kurtosis? | p. 8 |
Data and Statistics | p. 9 |
The Central Limit Theorem | p. 9 |
Correlation versus Causation | p. 9 |
Statistical Inferences | p. 10 |
Statistical Terms RSS, TSS, R^2, and Fl Score | p. 10 |
What Is an Fl Score? | p. 11 |
Gini Impurity, Entropy, and Perplexity | p. 11 |
What Is Gini Impurity? | p. 12 |
What Is Entropy? | p. 12 |
Calculating Gini Impurity and Entropy Values | p. 12 |
Multidimensional Gini Index | p. 13 |
What Is Perplexity? | p. 14 |
Cross-Entropy and KX Divergence | p. 14 |
What Is Cross-Entropy? | p. 14 |
What Is KL Divergence? | p. 15 |
What Is Their Purpose? | p. 15 |
Covariance and Correlation Matrices | p. 16 |
The Covariance Matrix | p. 16 |
Covariance Matrix: An Example | p. 17 |
The Correlation Matrix | p. 17 |
Eigenvalues and Eigenvectors | p. 18 |
Calculating Eigenvectors: A Simple Example | p. 18 |
Gauss Jordan Elimination (Optional) | p. 19 |
PCA (Principal Component Analysis) | p. 20 |
The New Matrix of Eigenvectors | p. 21 |
Well-Known Distance Metrics | p. 22 |
Pearson Correlation Coefficient | p. 23 |
Jaccard Index (or Similarity) | p. 23 |
Local Sensitivity Hashing (Optional) | p. 24 |
Types of Distance Metrics | p. 24 |
What Is Bayesian Inference? | p. 26 |
Bayes' Theorem | p. 26 |
Some Bayesian Terminology | p. 26 |
What Is MAP? | p. 27 |
Why Use Bayes' Theorem? | p. 27 |
Summary | p. 27 |
Working with Data | p. 29 |
Dealing With Data: What Can Go Wrong? | p. 30 |
What Is Data Drift? | p. 30 |
What Are Datasets? | p. 30 |
Data Preprocessing | p. 31 |
Data Types | p. 32 |
Preparing Datasets | p. 33 |
Discrete Data versus Continuous Data | p. 33 |
"Binning" Continuous Data | p. 34 |
Scaling Numeric Data via Normalization | p. 34 |
Scaling Numeric Data via Standardization | p. 35 |
Scaling Numeric Data via Robust Standardization | p. 36 |
What to Look for in Categorical Data | p. 36 |
Mapping Categorical Data to Numeric Values | p. 37 |
Working With Dates | p. 39 |
Working With Currency | p. 39 |
Working With Outliers and Anomalies | p. 39 |
Outlier Detection/Removal | p. 40 |
Finding Outliers With Numpy | p. 41 |
Finding Outliers With Pandas | p. 44 |
Calculating Z-Scores to Find Outliers | p. 46 |
Finding Outliers With SkLearn (Optional) | p. 48 |
Working With Missing Data | p. 49 |
Imputing Values: When Is Zero a Valid Value? | p. 50 |
Dealing With Imbalanced Datasets | p. 51 |
What Is SMOTE? | p. 52 |
SMOTE Extensions | p. 52 |
The Bias-Variance Tradeoff | p. 53 |
Types of Bias in Data | p. 54 |
Analyzing Classifiers (Optional) | p. 55 |
What Is LIME? | p. 55 |
What Is ANOVA? | p. 56 |
Summary | p. 56 |
Introduction to Pandas | p. 57 |
What Is Pandas? | p. 57 |
Pandas DataFrames | p. 58 |
Pandas Operations: In-place or Not? | p. 58 |
Data Frames and Data Cleaning Tasks | p. 58 |
A Pandas DataFrame Example | p. 59 |
Describing a Pandas Data Frame | p. 61 |
Pandas Boolean Data Frames | p. 63 |
Transposing a Pandas Data Frame | p. 63 |
Pandas Data Frames and Random Numbers | p. 64 |
Converting Categorical Data to Numeric Data | p. 66 |
Merging and Splitting Columns in Pandas | p. 69 |
Combining Pandas DataFrames | p. 71 |
Data Manipulation With Pandas DataFrames | p. 72 |
Pandas DataFrames and CSV Files | p. 73 |
Useful Options for the Pandas read_csv() Function | p. 76 |
Reading Selected Rows From CSV Files | p. 76 |
Pandas DataFrames and Excel Spreadsheets | p. 79 |
Useful Options for Reading Excel Spreadsheets | p. 80 |
Select, Add, and Delete Columns in Data frames | p. 80 |
Handling Outliers in Pandas | p. 82 |
Pandas DataFrames and Simple Statistics | p. 84 |
Finding Duplicate Rows in Pandas | p. 85 |
Finding Missing Values in Pandas | p. 87 |
Missing Values in Iris-Based Dataset | p. 90 |
Sorting Data Frames in Pandas | p. 93 |
Working With groupby() in Pandas | p. 94 |
Aggregate Operations With the titanic.csv Dataset | p. 96 |
Working With apply() and mapapply() in Pandas | p. 98 |
Useful One-Line Commands in Pandas | p. 101 |
Working With JSON-Based Data | p. 103 |
Python Dictionary and JSON | p. 103 |
Python, Pandas, and JSON | p. 104 |
Summary | p. 105 |
Introduction to RDBMS and SQL | p. 107 |
What Is an RDBMS? | p. 107 |
What Relationships Do Tables Have in an RDBMS? | p. 107 |
Features of an RDBMS | p. 108 |
What Is ACID? | p. 108 |
When Do We Need an RDBMS? | p. 109 |
The Importance of Normalization | p. 110 |
A Four-Table RDBMS | p. 111 |
Detailed Table Descriptions | p. 112 |
The customers Table | p. 112 |
The purchase_orders Table | p. 113 |
The line_items Table | p. 114 |
The item_desc Table | p. 115 |
What Is SQL? | p. 116 |
DCL, DDL? DQL, DML, and TCL | p. 117 |
SQL Privileges | p. 118 |
Properties of SQL Statements | p. 118 |
The CREATE Keyword | p. 119 |
What Is MySQL? | p. 119 |
What About MariaDB? | p. 119 |
Installing MySQL | p. 120 |
Data Types in MySQL | p. 120 |
The CHAR and VARCHAR Data Types | p. 120 |
String-Based Data Types | p. 121 |
FLOAT and DOUBLE Data Types | p. 121 |
BLOB and TEXT Data Types | p. 121 |
MySQL Database Operations | p. 122 |
Creating a Database | p. 122 |
Display a List of Databases | p. 122 |
Display a List of Database Users | p. 123 |
Dropping a Database | p. 123 |
Exporting a Database | p. 123 |
Renaming a Database | p. 124 |
The INFORMATION_SCHEMA Table | p. 125 |
The PROCESSLIST Table | p. 126 |
SQL Formatting Tools | p. 127 |
Summary | p. 127 |
Working with SQL and MySQL | p. 129 |
Create Database Tables | p. 130 |
Manually Creating Tables for mytools.com | p. 130 |
Creating Tables via an SQL Script for mytools.com | p. 131 |
Creating Tables With Japanese Test | p. 132 |
Creating Tables From the Command Line | p. 134 |
Drop Database Tables | p. 134 |
Dropping Tables via a SQL Script for mytools.com | p. 134 |
Altering Database Tables With the ALTER Keyword | p. 135 |
Add a Column to a Database Table | p. 135 |
Drop a Column From a Database Table | p. 137 |
Change the Data Type of a Column | p. 137 |
What Are Referential Constraints? | p. 139 |
Combining Data for a Table Update (Optional) | p. 139 |
Merging Data for a Table Update | p. 140 |
Appending Data to a Table From a CSV File | p. 141 |
Appending Table Data from CSV Files via SQL | p. 142 |
Inserting Data Into Tables | p. 144 |
Populating Tables From Text Files | p. 144 |
Working With Simple SELECT Statements | p. 146 |
Duplicate versus Distinct Rows | p. 148 |
Unique Rows | p. 148 |
The EXISTS Keyword | p. 148 |
The LIMIT Keyword | p. 149 |
DELETE, TRUNCATE, and DROP in SQL | p. 149 |
More Options for the DELETE Statement in SQL | p. 150 |
Creating Tables From Existing Tables in SQL | p. 150 |
Working With Temporary Tables in SQL | p. 151 |
Creating Copies of Existing Tables in SQL | p. 152 |
What Is an SQL Index? | p. 152 |
Types of Indexes | p. 152 |
Creating an Index | p. 153 |
Disabling and Enabling an Index | p. 153 |
View and Drop Indexes | p. 154 |
Overhead of Indexes | p. 155 |
Considerations for Defining Indexes | p. 155 |
Selecting Columns for an Index | p. 156 |
Finding Columns Included in Indexes | p. 157 |
Export Data From MySQL | p. 157 |
Export the Result Set of a SQL Query | p. 157 |
Export a Database or Its Contents | p. 157 |
Using LOAD DATA in MySQL | p. 158 |
Data Cleaning in SQL | p. 158 |
Replace NULL With 0 | p. 159 |
Replace NULL Values With Average Value | p. 159 |
Replace Multiple Values With a Single Value | p. 161 |
Handle Mismatched Attribute Values | p. 162 |
Convert Strings to Date Values | p. 163 |
Data Cleaning From the Command Line (Optional) | p. 165 |
Working With the sed Utility | p. 165 |
Working With the awk Utility | p. 167 |
Summary | p. 169 |
NLP and Data Cleaning | p. 171 |
NLP Tasks in ML | p. 171 |
NLP Steps for Training a Model | p. 172 |
Text Normalization and Tokenization | p. 172 |
Word Tokenization in Japanese | p. 173 |
Text Tokenization With Unix Commands | p. 175 |
Handling Stop Words | p. 175 |
What Is Stemming? | p. 176 |
Singular versus Plural Word Endings | p. 176 |
Common Stemmers | p. 176 |
Stemmers and Word Prefixes | p. 177 |
Over Stemming and Under Stemming | p. 177 |
What Is Lemmatization? | p. 178 |
Stemming/Lemmatization Caveats | p. 178 |
Limitations of Stemming and Lemmatization | p. 178 |
Working With Text: POS | p. 179 |
POS Tagging | p. 179 |
POS Tagging Techniques | p. 179 |
Cleaning Data With Regular Expressions | p. 180 |
Cleaning Data With the cleantext Library | p. 184 |
Handling Contracted Words | p. 185 |
What Is BeautifulSoup? | p. 187 |
Web Scraping With Pure Regular Expressions | p. 190 |
What Is Scrapy? | p. 192 |
Summary | p. 193 |
Data Visualization | p. 195 |
What Is Data Visualization? | p. 195 |
Types of Data Visualization | p. 196 |
What Is Matplotlib? | p. 196 |
Lines in a Grid in Matplotlib | p. 197 |
A Colored Grid in Matplotlib | p. 198 |
Randomized Data Points in Matplotlib | p. 199 |
A Histogram in Matplotlib | p. 200 |
A Set of Line Segments in Matplotlib | p. 201 |
Plotting Multiple Lines in Matplotlib | p. 202 |
Trigonometric Functions in Matplotlib | p. 203 |
Display IQ Scores in Matplotlib | p. 204 |
Plot a Best-Fitting Line in Matplotlib | p. 204 |
The Iris Dataset in Sklearn | p. 206 |
Sklearn, Pandas, and the Iris Dataset | p. 207 |
Working With Seaborn | p. 209 |
Features of Seaborn | p. 210 |
Seaborn Built-in Datasets | p. 210 |
The Iris Dataset in Seaborn | p. 211 |
The Titanic Dataset in Seaborn | p. 212 |
Extracting Data From the Titanic Dataset in Seaborn (1) | p. 212 |
Extracting Data from Titanic Dataset in Seaborn (2) | p. 215 |
Visualizing a Pandas Dataset in Seaborn | p. 217 |
Data Visualization in Pandas | p. 219 |
What Is Bokeh? | p. 220 |
Summary | p. 222 |
Index | p. 223 |
Table of Contents provided by Ingram. All Rights Reserved. |