Image for post
Image for post
Photo by Wonderlane on Unsplash

Link to Kaggle Notebook

Use case of LSH

A classical application of similarity search is in recommender systems: Suppose you have shown interest in a particular item, for example a news article x. The semantic meaning of a piece of text can be represented as a high-dimensional feature vector, for example computed using latent semantic indexing. In order to recommend other news articles we might search the set P of article feature vectors for articles that are “close” to x.

In this case, for a large textual dataset containing millions of words, the problem is there may be far too many pairs of items…


Image for post
Image for post
basic-multi-armed-bandit-Santa-Competition

Link to my Kaggle Kernel and this one attempt to solve one Kaggle Challenge named Santa 2020 - The Candy Cane Contest

Multi-armed bandit problems are some of the simplest reinforcement learning (RL) problems to solve. We have an agent that we allow to choose actions, and each action has a reward that is returned according to a given, underlying probability distribution. The game is played over many episodes (single actions in this case) and the goal is to maximize your reward.

To explain further, how do you most efficiently identify the best machine to play, whilst sufficiently exploring the many options in real-time? This problem is not an exercise in theoretical abstraction, it is an analogy for a common problem that organizations face all the time, that is, how to identify the best message to present to customers (message is broadly defined here i.e. webpages, advertising, images) such that it maximizes some business objective (e.g. clickthrough rate, signups).

The…


Image for post
Image for post
Photo by nik radzi on Unsplash

Kaggle Notebook link with all the running code in this blog.

In this post, I shall go over TF-IDF Model and its implementation with Scikit-learn.

Traditional Feature Engineering Models

Traditional (count-based) feature engineering strategies for textual data belong to a family of models popularly known as the Bag of Words model. This includes term frequencies, TF-IDF (term frequency-inverse document frequency), N-grams, topic models, and so on. …


Image for post
Image for post
Photo by hao wang on Unsplash

What is t-SNE?

t-Distributed Stochastic Neighbor Embedding (t-SNE) is an unsupervised, non-linear technique developed by Laurens van der Maaten and Geoffrey Hinton in 2008.

The algorithm has two steps:

We initially construct a probability distribution in such a way that objects with a higher similarity have a higher probability to be grouped together than objects with lower probability. This is done over pairs of higher-dimensional objects.

We then construct a similar probability distribution over the lower-dimensional map so that the Kullback–Leibler divergence between the two distributions, with respect to their location on the map, is minimized.

Usually, the algorithm uses Euclidean distance as the base metric but it…


[ 99.3% score in Kaggle Digit Recognizer Challenge]

Image for post
Image for post
Photo by Federica Giusti on Unsplash

Link to Kaggle Notebook

In many practical applications, although the data reside in a high-dimensional space, the true dimensionality, known as intrinsic dimensionality, can be of a much lower value.

For example, in a three-dimensional space, the data may cluster around a straight line, or around the circumference of a circle or the graph of a parabola, arbitrarily placed in R³. In all previous cases, the intrinsic dimensionality of the data is equal to one, as any of these curves can equivalently be described in terms of a single parameter.

Below figure…


Image for post
Image for post
Photo by Mukil Menon on Unsplash

Many have gone through this issue and today I faced the it in my Ubuntu 20.04 machine.

Error : Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above

This normally iscaused by either an incompatibility in cuda, cudnn and Nvidia drivers or memory growth issue. The solution in here addresses the memory growth issue which was the case for me today.

This solution here worked for me.

Set the TF_FORCE_GPU_ALLOW_GROWTH environment variable to true. In your terminal, run this command.

export TF_FORCE_GPU_ALLOW_GROWTH=true

Other Details around versions in my Machine

My…


Image for post
Image for post
Photo by Sebastian Staines on Unsplash

Link to Kaggle Notebook for this entire exercise

In general, all current machine-learning systems use tensors as their basic data structure. Tensors are fundamental to the field — so fundamental that Google’s TensorFlow was named after them. Even the text data or image data are converted to Numerical features for processing.

So what’s a tensor?

At its core, a tensor is a container for data — almost always numerical data. So, it’s a container for numbers. …


Image for post
Image for post
Photo by Mathew MacQuarrie on Unsplash

Link to Kaggle Notebook

Description of the Data

The Haberman’s survival dataset covers cases from a study by University of Chicago’s Billings Hospital done between 1958 and 1970 on the subject of patients-survival who had undergone surgery for breast cancer.

Label/Attribute Information:

  • Age of the patient at time of operation — numerical
  • Year of operation (based on 1900, numerical)
  • Number of positive axillary nodes detected — See note below on this (numerical)
  • Survival status (this is a class attribute) where 1 means — patient survived 5 years or longer and 2 means patient died within 5 years

A note on axillary lymph nodes and its relation with breast cancer diagnosis ?

Source

The lymphatic system is one of…


Image for post
Image for post
Photo by Johannes Plenio on Unsplash

Link to Kaggle Notebook for all these exercises together

Q: convert-decimal-to-binary

Performing Short Division by Two with Remainder (For integer part)

This is a straightforward method which involve dividing the number to be converted. Let decimal number is N then divide this number from 2 because base of binary number system is 2. Note down the value of remainder, which will be either 0 or 1. Again divide remaining decimal number till it became 0 and note every remainder of every step. Then write remainders from bottom to up (or in reverse order), which will be equivalent binary number of…

Rohan Paul

DataScience | ML | 2x Kaggle Expert. Ex Fullstack Engineer and Ex International Financial Analyst. https://www.linkedin.com/in/rohan-paul-b27285129/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store