Machine Learning Guide

Author: Vários
Narrator: Vários
Publisher: Podcast
Duration: 42:21:33

More information

Listen

Synopsis

This series aims to teach you the high level fundamentals of machine learning from A to Z. I'll teach you the basic intuition, algorithms, and math. We'll discuss languages and frameworks, deep learning, and more. Audio may be an inferior medium to task; but with all our exercise, commute, and chores hours of the day, not having an audio supplementary education would be a missed opportunity. And where your other resources will provide you the machine learning trees, Ill provide the forest. Additionally, consider me your syllabus. At the end of every episode Ill provide the best-of-the-best resources curated from around the web for you to learn each episodes details.

Episodes

MLA 013 Tech Stack for Customer-Facing Machine Learning Products

03/01/2021 Duration: 47min

Primary technology recommendations for building a customer-facing machine learning product include React and React Native for the front end, serverless platforms like AWS Amplify or GCP Firebase for authentication and basic server/database needs, and Postgres as the relational database of choice. Serverless approaches are encouraged for scalability and security, with traditional server frameworks and containerization recommended only for advanced custom backend requirements. When serverless options are inadequate, use Node.js with Express or FastAPI in Docker containers, and consider adding Redis for in-memory sessions and RabbitMQ or SQS for job queues, though many of these functions can be handled by Postgres. The machine learning server itself, including deployment strategies, will be discussed separately. Links Notes and resources at ocdevel.com/mlg/mla-13 Try a walking desk stay healthy & sharp while you learn & code Client Applications React is recommended as the primary web front-end fram

Listen

Listen
MLA 012 Docker for Machine Learning Workflows

09/11/2020 Duration: 31min

Docker enables efficient, consistent machine learning environment setup across local development and cloud deployment, avoiding many pitfalls of virtual machines and manual dependency management. It streamlines system reproduction, resource allocation, and GPU access, supporting portability and simplified collaboration for ML projects. Machine learning engineers benefit from using pre-built Docker images tailored for ML, allowing seamless project switching, host OS flexibility, and straightforward deployment to cloud platforms like AWS ECS and Batch, resulting in reproducible and maintainable workflows. Links Notes and resources at ocdevel.com/mlg/mla-12 Try a walking desk stay healthy & sharp while you learn & code Traditional Environment Setup Challenges Traditional machine learning development often requires configuring operating systems, GPU drivers (CUDA, cuDNN), and specific package versions directly on the host machine. Manual setup can lead to version conflicts, resource allocation issue

Listen

Listen
MLG 032 Cartesian Similarity Metrics

08/11/2020 Duration: 41min

Try a walking desk while studying ML or working on your projects! Show notes at ocdevel.com/mlg/32. L1/L2 norm, Manhattan, Euclidean, cosine distances, dot product Normed distances link A norm is a function that assigns a strictly positive length to each vector in a vector space. link Minkowski is generalized. p_root(sum(xi-yi)^p). "p" = ? (1, 2, ..) for below. L1: Manhattan/city-block/taxicab. abs(x2-x1)+abs(y2-y1). Grid-like distance (triangle legs). Preferred for high-dim space. L2: Euclidean. sqrt((x2-x1)^2+(y2-y1)^2. sqrt(dot-product). Straight-line distance; min distance (Pythagorean triangle edge) Others: Mahalanobis, Chebyshev (p=inf), etc Dot product A type of inner product. Outer-product: lies outside the involved planes. Inner-product: dot product lies inside the planes/axes involved link. Dot product: inner product on a finite dimensional Euclidean space link Cosine (normalized dot)

Listen

Listen
MLA 011 Practical Clustering Tools

08/11/2020 Duration: 34min

Primary clustering tools for practical applications include K-means using scikit-learn or Faiss, agglomerative clustering leveraging cosine similarity with scikit-learn, and density-based methods like DBSCAN or HDBSCAN. For determining the optimal number of clusters, silhouette score is generally preferred over inertia-based visual heuristics, and it natively supports pre-computed distance matrices. Links Notes and resources at ocdevel.com/mlg/mla-11 Try a walking desk stay healthy & sharp while you learn & code K-means Clustering K-means is the most widely used clustering algorithm and is typically the first method to try for general clustering tasks. The scikit-learn KMeans implementation is suitable for small to medium-sized datasets, while Faiss's kmeans is more efficient and accurate for very large datasets. K-means requires the number of clusters to be specified in advance and relies on the Euclidean distance metric, which performs poorly in high-dimensional spaces. When document embedding

Listen

Listen
MLA 010 NLP packages: transformers, spaCy, Gensim, NLTK

28/10/2020 Duration: 26min

The landscape of Python natural language processing tools has evolved from broad libraries like NLTK toward more specialized packages such as Gensim for topic modeling, SpaCy for linguistic analysis, and Hugging Face Transformers for advanced tasks, with Sentence Transformers extending transformer models to enable efficient semantic search and clustering. Each library occupies a distinct place in the NLP workflow, from fundamental text preprocessing to semantic document comparison and large-scale language understanding. Links Notes and resources at ocdevel.com/mlg/mla-10 Try a walking desk stay healthy & sharp while you learn & code Historical Foundation: NLTK NLTK ("Natural Language Toolkit") was one of the earliest and most popular Python libraries for natural language processing, covering tasks from tokenization and stemming to document classification and syntax parsing. NLTK remains a catch-all "Swiss Army knife" for NLP, but many of its functions have been supplemented or superseded by newe

Listen

Listen
MLA 009 Charting and Visualization Tools for Data Science

06/11/2018 Duration: 24min

Python charting libraries - Matplotlib, Seaborn, and Bokeh - explaining, their strengths from quick EDA to interactive, HTML-exported visualizations, and clarifies where D3.js fits as a JavaScript alternative for end-user applications. It also evaluates major software solutions like Tableau, Power BI, QlikView, and Excel, detailing how modern BI tools now integrate drag-and-drop analytics with embedded machine learning, potentially allowing business users to automate entire workflows without coding. Links Notes and resources at ocdevel.com/mlg/mla-9 Try a walking desk stay healthy & sharp while you learn & code Core Phases in Data Science Visualization Exploratory Data Analysis (EDA): EDA occupies an early stage in the Business Intelligence (BI) pipeline, positioned just before or sometimes merged with the data cleaning (“munging”) phase. The outputs of EDA (e.g., correlation matrices, histograms) often serve as inputs to subsequent machine learning steps. Python Visualization Libraries 1. M

Listen

Listen
MLA 008 Exploratory Data Analysis (EDA)

26/10/2018 Duration: 25min

Exploratory data analysis (EDA) sits at the critical pre-modeling stage of the data science pipeline, focusing on uncovering missing values, detecting outliers, and understanding feature distributions through both statistical summaries and visualizations, such as Pandas' info(), describe(), histograms, and box plots. Visualization tools like Matplotlib, along with processes including imputation and feature correlation analysis, allow practitioners to decide how best to prepare, clean, or transform data before it enters a machine learning model. Links Notes and resources at ocdevel.com/mlg/mla-8 Try a walking desk stay healthy & sharp while you learn & code EDA in the Data Science Pipeline Position in Pipeline: EDA is an essential pre-processing step in the business intelligence (BI) or data science pipeline, occurring after data acquisition but before model training. Purpose: The goal of EDA is to understand the data by identifying: Missing values (nulls) Outliers Feature distributions Relation

Listen

Listen
MLA 007 Jupyter Notebooks

16/10/2018 Duration: 16min

Jupyter Notebooks, originally conceived as IPython Notebooks, enable data scientists to combine code, documentation, and visual outputs in an interactive, browser-based environment supporting multiple languages like Python, Julia, and R. This episode details how Jupyter Notebooks structure workflows into executable cells - mixing markdown explanations and inline charts - which is essential for documenting, demonstrating, and sharing data analysis and machine learning pipelines step by step. Links Notes and resources at ocdevel.com/mlg/mla-7 Try a walking desk stay healthy & sharp while you learn & code Overview of Jupyter Notebooks Historical Context and Scope Jupyter Notebooks began as IPython Notebooks focused solely on Python. The project was renamed Jupyter to support additional languages - namely Julia ("JU"), Python ("PY"), and R ("R") - broadening its applicability for data science and machine learning across multiple languages. Interactive, Narrative-Driven Coding Jupyter Notebooks

Listen

Listen
MLA 006 Salaries for Data Science & Machine Learning

19/07/2018 Duration: 19min

O'Reilly's 2017 Data Science Salary Survey finds that location is the most significant salary determinant for data professionals, with median salaries ranging from $134,000 in California to under $30,000 in Eastern Europe, and highlights that negotiation skills can lead to salary differences as high as $45,000. Other key factors impacting earnings include company age and size, job title, industry, and education, while popular tools and languages—such as Python, SQL, and Spark—do not strongly influence salary despite widespread use. Links Notes and resources at ocdevel.com/mlg/mla-6 Try a walking desk stay healthy & sharp while you learn & code Global and Regional Salary Differences Median Global Salary: $90,000 USD, up from $85,000 the previous year. Regional Breakdown: United States: $112,000 median; California leads at $134,000. Western Europe: $57,000—about half the US median. Australia & New Zealand: Second after the US. Eastern Europe: Below $30,000. Asia: Wide interquartile salary ran

Listen

Listen
MLA 005 Shapes and Sizes: Tensors and NDArrays

09/06/2018 Duration: 27min

Explains the fundamental differences between tensor dimensions, size, and shape, clarifying frequent misconceptions—such as the distinction between the number of features (“columns”) and true data dimensions—while also demystifying reshaping operations like expand_dims, squeeze, and transpose in NumPy. Through practical examples from images and natural language processing, listeners learn how to manipulate tensors to match model requirements, including scenarios like adding dummy dimensions for grayscale images or reordering axes for sequence data. Links Notes and resources at ocdevel.com/mlg/mla-5 Try a walking desk stay healthy & sharp while you learn & code Definitions Tensor: A general term for an array of any number of dimensions. 0D Tensor (Scalar): A single number (e.g., 5). 1D Tensor (Vector): A simple list of numbers. 2D Tensor (Matrix): A grid of numbers (rows and columns). 3D+ Tensors: Higher-dimensional arrays, such as images or batches of images. NDArray (NumPy): Stands for "N-

Listen

Listen
MLA 003 Storage: HDF, Pickle, Postgres

24/05/2018 Duration: 17min

Practical workflow of loading, cleaning, and storing large datasets for machine learning, moving from ingesting raw CSVs or JSON files with pandas to saving processed datasets and neural network weights using HDF5 for efficient numerical storage. It clearly distinguishes among storage options—explaining when to use HDF5, pickle files, or SQL databases—while highlighting how libraries like pandas, TensorFlow, and Keras interact with these formats and why these choices matter for production pipelines. Links Notes and resources at ocdevel.com/mlg/mla-3 Try a walking desk stay healthy & sharp while you learn & code Data Ingestion and Preprocessing Data Sources and Formats: Datasets commonly originate as CSV (comma-separated values), TSV (tab-separated values), fixed-width files (FWF), JSON from APIs, or directly from databases. Typical applications include structured data (e.g., real estate features) or unstructured data (e.g., natural language corpora for sentiment analysis). Pandas as the Cor

Listen

Listen
MLA 002 Numpy & Pandas

24/05/2018 Duration: 18min

NumPy enables efficient storage and vectorized computation on large numerical datasets in RAM by leveraging contiguous memory allocation and low-level C/Fortran libraries, drastically reducing memory footprint compared to native Python lists. Pandas, built on top of NumPy, introduces labelled, flexible tabular data manipulation—facilitating intuitive row and column operations, powerful indexing, and seamless handling of missing data through tools like alignment, reindexing, and imputation. Links Notes and resources at ocdevel.com/mlg/mla-2 Try a walking desk stay healthy & sharp while you learn & code NumPy: Efficient Numerical Arrays and Vectorized Computation Purpose and Design NumPy ("Numerical Python") is the foundational library for handling large numerical datasets in RAM. It introduces the ndarray (n-dimensional array), which is synonymous with a tensor—enabling storage of vectors, matrices, or higher-dimensional data. Memory Efficiency NumPy arrays are homogeneous: all elements sha

Listen

Listen
MLA 001 Degrees, Certificates, and Machine Learning Careers

24/05/2018 Duration: 11min

While industry-respected credentials like Udacity Nanodegrees help build a practical portfolio for machine learning job interviews, they remain insufficient stand-alone qualifications—most roles require a Master’s degree as a near-hard requirement, especially compared to more flexible web development fields. A Master’s, such as Georgia Tech’s OMSCS, not only greatly increases employability but is strongly recommended for those aiming for entry into machine learning careers, while a PhD is more appropriate for advanced, research-focused roles with significant time investment. Links Notes and resources at ocdevel.com/mlg/mla-1 Online Certificates: Usefulness and Limitations Udacity Nanodegree Provides valuable hands-on experience and a practical portfolio of machine learning projects. Demonstrates self-motivation and the ability to self-teach. Not industry-recognized as a formal qualification—does not by itself suffice for job placement in most companies. Best used as a supplement to demonstrate applied

Listen

Listen
MLG 029 Reinforcement Learning Intro

05/02/2018 Duration: 43min

Try a walking desk while studying ML or working on your projects! Introduction to reinforcement learning concepts. ocdevel.com/mlg/29 for notes and resources.

Listen

Listen
MLG 028 Hyperparameters 2

04/02/2018 Duration: 51min

Try a walking desk while studying ML or working on your projects! Hyperparameters part 2: hyper-search, regularization, SGD optimizers, scaling. ocdevel.com/mlg/28 for notes and resources

Listen

Listen
MLG 027 Hyperparameters 1

28/01/2018 Duration: 47min

Try a walking desk while studying ML or working on your projects! Hyperparameters part 1: network architecture. ocdevel.com/mlg/27 for notes and resources

Listen

Listen
MLG 026 Project Bitcoin Trader

27/01/2018 Duration: 38min

Try a walking desk while studying ML or working on your projects! Community project & intro to Bitcoin/crypto + trading. ocdevel.com/mlg/26 for notes and resources

Listen

Listen
MLG 025 Convolutional Neural Networks

30/10/2017 Duration: 44min

Try a walking desk while studying ML or working on your projects! Convnets or CNNs. Filters, feature maps, window/stride/padding, max-pooling. ocdevel.com/mlg/25 for notes and resources

Listen

Listen
MLG 024 Tech Stack

07/10/2017 Duration: 01h01min

Try a walking desk while studying ML or working on your projects! TensorFlow, Pandas, Numpy, Scikit-Learn, Keras, TensorForce. ocdevel.com/mlg/24 for notes and resources

Listen

Listen
MLG 023 Deep NLP 2

20/08/2017 Duration: 43min

Try a walking desk while studying ML or working on your projects! RNN review, bi-directional RNNs, LSTM & GRU cells. ocdevel.com/mlg/23 for notes and resources

Listen

Listen

page 2 from 3

Machine Learning Guide

Synopsis

Episodes

MLA 013 Tech Stack for Customer-Facing Machine Learning Products

MLA 012 Docker for Machine Learning Workflows

MLG 032 Cartesian Similarity Metrics

MLA 011 Practical Clustering Tools

MLA 010 NLP packages: transformers, spaCy, Gensim, NLTK

MLA 009 Charting and Visualization Tools for Data Science

MLA 008 Exploratory Data Analysis (EDA)

MLA 007 Jupyter Notebooks

MLA 006 Salaries for Data Science & Machine Learning

MLA 005 Shapes and Sizes: Tensors and NDArrays

MLA 003 Storage: HDF, Pickle, Postgres

MLA 002 Numpy & Pandas

MLA 001 Degrees, Certificates, and Machine Learning Careers

MLG 029 Reinforcement Learning Intro

MLG 028 Hyperparameters 2

MLG 027 Hyperparameters 1

MLG 026 Project Bitcoin Trader

MLG 025 Convolutional Neural Networks

MLG 024 Tech Stack

MLG 023 Deep NLP 2

Join Now

Need help

Install our app:

Machine Learning Guide

Informações:

Synopsis

Episodes

MLA 013 Tech Stack for Customer-Facing Machine Learning Products

MLA 012 Docker for Machine Learning Workflows

MLG 032 Cartesian Similarity Metrics

MLA 011 Practical Clustering Tools

MLA 010 NLP packages: transformers, spaCy, Gensim, NLTK

MLA 009 Charting and Visualization Tools for Data Science

MLA 008 Exploratory Data Analysis (EDA)

MLA 007 Jupyter Notebooks

MLA 006 Salaries for Data Science & Machine Learning

MLA 005 Shapes and Sizes: Tensors and NDArrays

MLA 003 Storage: HDF, Pickle, Postgres

MLA 002 Numpy & Pandas

MLA 001 Degrees, Certificates, and Machine Learning Careers

MLG 029 Reinforcement Learning Intro

MLG 028 Hyperparameters 2

MLG 027 Hyperparameters 1

MLG 026 Project Bitcoin Trader

MLG 025 Convolutional Neural Networks

MLG 024 Tech Stack

MLG 023 Deep NLP 2

Related titles

Join Now

Need help

Install our app: