# Python Most Common Challenges

Link to Kaggle Notebook for all these exercises together

## Q: Product of two matrices

Given two matrices print the product of those two matrices

## Basics of Matrix Multiplication

Multiplication rule to remember — Here also the flow is Row and then Column

1. Sum-Product of First Row * 1st Column => Becomes the 1-st_Row-1-st_Column of the resultant Matrix
2. Sum-Product of 1st Row * 2-nd Column => Becomes the 1-st_Row-2-nd_Column of the resultant Matrix

## 4. Lastly, the element at row i, column j is the product of the ith row of matrix A and the jth column of matrix B.

Further read through this for a very nice visual flow of Matrix Multiplication.

`def matrix_mul(A, B):    num_rows_A = len(A)    num_columns_A = len(A[0])    num_rows_B = len(B)    num_columns_B = len(B[0])    # To multiply an m×n matrix by an n×p matrix, the ns must be the same,    # and the result is an m×p matrix.    if num_columns_A != num_rows_B:        print(            "Matrix multiplication of two arguments not possible as number of columns in first Matrix is NOT equal to the number of rows in second Matrix")        return    # Create an result matrix which will have    # dimensions of num_rows_A x num_columns_B    # And fill this matrix with zeros    result_matrix = [[0 for i in range(num_columns_B)] for j in range(num_rows_A)]    # Now implementing the key principle    # The element at row i, column j is the product of the ith row of matrix A and the jth column of matrix B.    for i in range(num_rows_A):        for j in range(num_columns_B):            for k in range(num_columns_A):                # k-th column of A should be the k-th row of B                result_matrix[i][j] += A[i][k] * B[k][j]    return result_matrixA = [[1, 3, 4],     [2, 5, 7],     [5, 9, 6]]B = [[1, 0, 0],     [0, 1, 0],     [0, 0, 1]]A1 = [[1, 2],      [3, 4]]B1 = [[1, 2, 3, 4, 5],      [5, 6, 7, 8, 9]]A2 = [[1, 3, 4], [5, 9, 6]]B2 = [[1, 0, 0], [0, 0, 1]]print(matrix_mul(A, B))print(matrix_mul(A1, B1))print(matrix_mul(A2…`

# Encoding Categorical Variables in Machine Learning Dataset

Link to Kaggle Notebook with UCI-Breast Cancer Data

## What is Categorical Data

Categorical variables are those values in a dataset that are selected from a group of categories or labels. Typically, any data attribute which is categorical in nature represents discrete values that belong to a specific finite set of categories or classes. These are also often known as classes or labels in the context of attributes or variables which are to be predicted by a model (popularly known as response variables). These discrete values can be text or numeric in nature (or even unstructured data like images!).

## There are two major classes of categorical data, nominal and ordinal.

In any nominal categorical data attribute, there is no concept of ordering amongst the values of that attribute. Consider a simple example of weather categories like — sunny, cloudy, rainy, etc. These are without any concept or notion of order (windy doesn’t always occur before sunny nor is it smaller or bigger than sunny). …

# LightGBM, XGBoost and CatBoost — Kaggle — Santander Challenge

## Achieved a score of 1.4714 with this Kernel in Kaggle

(If you like the Kaggle Notebook, please consider upvoting it in Kaggle)

Getting the data and Kaggle Challenge Link

Gradient Boosted trees have become one of the most powerful algorithms for training on tabular data. Over the recent past, we’ve been fortunate to have may implementations of boosted trees — each with their own unique characteristics. In this notebook, I will implement LightGBM, XGBoost and CatBoost to tackle this Kaggle problem.

What is Boosting

To understand the absolute basics of the need for Boosting algorithm, let's ask a basic question — If a data point is incorrectly predicted by our first model, and then the next (probably all models), will combining the predictions provide better results? …

# Exploratory Data Analysis and Moving Averages with Crypto-currency Data from coinmarketcap.com

The full Kaggle Jupyter Notebook is here

## Getting the Data

I have uploaded to this Kaggle Dataset which is a zipped file with 26,320 `.csv` files containing the top cryptocurrencies on https://coinmarketcap.com/ by market cap worldwide. After 20:45:05 on August 4, data was collected every five minutes for three months.

Also, I have uploaded the 9,432 .csv files based on which this below EDA analysis done into my Github repository.

This dataset is from CoinMarketCap Data From August 4 to November 4, 2017

Filenames represent the date and time at which the data was…

# Kaggle House Prices Prediction with Linear Regression and Gradient Boosting

## As I intended this Notebook to be published as a blog on Linear Regression, Gradient Descent function and some EDA, so in the first 50% to 60% of this notebook I have mainly discussed the theory around those topics from mainly a beginner perspective.

`import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsfrom sklearn.pipeline import Pipelinefrom sklearn.preprocessing import LabelEncoder, OneHotEncoderfrom sklearn.compose import ColumnTransformerfrom sklearn.linear_model import LinearRegressionfrom sklearn.linear_model import Ridgefrom sklearn.linear_model import Lassofrom sklearn.ensemble import RandomForestRegressorfrom sklearn.impute import SimpleImputerfrom sklearn.preprocessing import StandardScalerfrom sklearn.model_selection import train_test_splitfrom sklearn import metricsfrom sklearn.metrics import mean_squared_log_errorfrom sklearn.metrics import mean_squared_errorfrom xgboost import XGBRegressorimport math`

The evaluation criteria for this Kaggle Competition is RMSLE — “Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. …

# Axis and Dimensions in Numpy and Pandas Array

As a career Data-Scientist, all through your life you have to deal with Matrix form of data where data in Numpy or Pandas or TensorFlow where Axis and Dimensions are the fundamental structural concept.

Basic Attributes of the ndarray Class

# What is the Shape of an Array

Let's consider the below array

The “shape” of this array is a tuple with the number of elements per axis (dimension). In our example, the shape is equal to (6, 3), i.e. we have 6 lines and 3 columns.

Numpy has a function called “shape” which returns the shape of an array. The shape is a tuple of integers. …

# Bias-Variance Trade-off in DataScience and Calculating with Python

The target of this blog post is to discuss the concept around and the Mathematics behind the below formulation of Bias-Variance Tradeoff.

And in super simple term

Total Prediction Error = Bias + Variance

The goal of any supervised machine learning model is to best estimate the mapping function (f) for the output/dependent variable (Y) given the input/independent variable (X). The mapping function is often called the target function because it is the function that a given supervised machine learning algorithm aims to approximate.

The Expected Prediction Error for any machine learning algorithm can be broken down into three parts:

Bias Error
Variance Error
Irreducible…

# Vectorizing Gradient Descent — Multivariate Linear Regression and Python implementation

In this article, I shall go over the topic of arriving at the Vectorized Gradient-Descent formulae for the Cost function of the for Matrix form of training-data Equations. And along with that the Fundamentals of Calculus (especially Partial Derivative) and Matrix Derivatives necessary to understand the process.

# First a Refresher on basic Matrix Algebra

A matrix A over a field K or, simply, a matrix A (when K is implicit) is a rectangular array of scalars usually presented in the following…

# Fundamentals of Multivariate Calculus for DataScience and Machine Learning

Multivariate Calculus is used all around Machine Learning and DataScience ecosystem, so having a first-principle understanding of it, is incredibly useful when you are dealing with some complex Math equations in implementing some ML Algo.

To start with, as soon as you need to implement multi-variate Linear Regression, you hit multivariate-calculus which is what you will have to use to derive the Gradient of a set of multi-variate Linear Equations i.e. Derivative of a Matrix. …

# What Derivative of a function really is

The most fundamental defination of Derivative can be stated as — derivative measures the steepness of the graph of a function at some particular point on the graph. Thus, the derivative of a function is a slope at a particular point. Strictly speaking, curves don’t have slope, so we use the slope of the tangent line at a particular point on the curve. That also means that it is a ratio of change in the value of the function (dependent variable) to the change in the independent variable.

Take a note in the above image, when Δx approaches 0, the secant line become Tangent at x0 and derivative of the function representing the above graph gives the slope of the tangent line at x0. …