Complete Machine Learning Package


Techniques, tools, best practices and everything you need to to learn machine learning!

git cover

This is a comprehensive repository containing 30+ notebooks on Python programming, data manipulation, data analysis, data visualization, data cleaning, classical machine learning, Computer Vision and Natural Language Processing(NLP).

All notebooks were created with the readers in mind. Every notebook starts with a high-level overview of any specific algorithm/concepts being covered. Wherever possible, visuals are used to make things clear.

Viewing and Running the Notebooks

The easiest way to view all the notebooks is to use Nbviewer.

If you want to play with the codes, you can use the following platforms:

Deepnote will direct you to Intro to Machine Learning. Heads to the project side bar for more notebooks.

Tools Overview

The following are the tools that are covered in the notebooks. They are popular tools that machine learning engineers and data scientists need in one way or another and day to day.

  • Python is a high level programming language that has got a lot of popularity in the data community and with the rapid growth of the libraries and frameworks, this is a right programming language to do ML.
  • NumPy is a scientific computing tool used for array or matrix operations.
  • Pandas is a great and simple tool for analyzing and manipulating data from a variety of different sources.
  • Matplotlib is a comprehensive data visualization tool used to create static, animated, and interactive visualizations in Python.
  • Seaborn is another data visualization tool built on top of Matplotlib which is pretty simple to use.
  • Scikit-Learn: Instead of building machine learning models from scratch, Scikit-Learn makes it easy to use classical models in a few lines of code. This tool is adapted by almost the whole of the ML community and industries, from the startups to the big techs.
  • TensorFlow and Keras for neural networks: TensorFlow is a popular deep learning framework used for building models suitable for different fields such as Computer Vision and Natural Language Processing. At its backend, it uses Keras which is a high level API for building neural networks easily. TensorFlow has gained a lot of popularity in the ML community due to its complete ecosystem made of wholesome tools including TensorBoard, TF Datasets, TensorFlow Lite, TensorFlow Extended, TensorFlow.js, etc…


Part 1 – Intro to Python and Working with Data

0 – Intro to Python for Machine Learning

1 – Data Computation With NumPy

  • Creating a NumPy Array
  • Selecting Data: Indexing and Slicing An Array
  • Performing Mathematical and other Basic Operations
  • Perform Basic Statistics
  • Manipulating Data

2 – Data Manipulation with Pandas

  • Basics of Pandas
    • Series and DataFrames
    • Data Indexing and Selection
    • Dealing with Missing data
    • Basic operations and Functions
    • Aggregation Methods
    • Groupby
    • Merging, Joining and Concatenate
  • Beyond Dataframes: Working with CSV, and Excel
  • Real World Exploratory Data Analysis (EDA)

3 – Data Visualization with Matplotlib and Seaborn

4 – Real World Data – Exploratory Analysis and Data Preparation

Part 2 – Machine Learning

5 – Intro to Machine Learning

  • Intro to Machine Learning
  • Machine Learning Workflow
  • Evaluation Metrics
  • Handling Underfitting and Overfitting

6 – Classical Machine Learning with Scikit-Learn

Part 3 – Deep Learning

7 – Intro to Artificial Neural Networks and TensorFlow

8 – Deep Computer Vision with TensorFlow

9 – Natural Language Processing with TensorFlow

Used Datasets

Many of the datasets used for this repository are from the following sources:

This repository was created by Jean de Dieu Nyandwi. You can find him on:

If you find any of this thing helpful, shoot him a tweet or a mention 🙂