A classical application of similarity search is in recommender systems: Suppose you have shown interest in a particular item, for example a news article x. The semantic meaning of a piece of text can be represented as a high-dimensional feature vector, for example computed using latent semantic indexing. In order to recommend other news articles we might search the set P of article feature vectors for articles that are “close” to x.
In this case, for a large textual dataset containing millions of words, the problem is there may be far too many pairs of items…
The general principle of ensemble methods is to construct a linear combination of some model ﬁtting method, instead of using a single ﬁt of the method. The main principle behind the ensemble model is that a group of weak learners come together to form a strong learner, thus increasing the accuracy of the model.When we try to predict the target variable using any machine learning technique, the main causes of difference in actual and predicted values are noise, variance, and bias. Ensemble helps to reduce these factors (except noise, which is irreducible…
For knowing the detail mechanisms of how XGBoost works you may check my this blog post.
Here’s a small part from that blog.
What is Boosting
To understand the absolute basics of the need for Boosting algorithm, let’s ask a basic question — If a data point is incorrectly predicted by our first model, and then the next (probably all models), will combining the predictions provide better results? Such questions are handled by boosting algorithm.
So, Boosting is a sequential technique that works on the principle of an ensemble, where each subsequent…
The combination of n different things taken r at a time is denoted by nCr and used the formula for calculation is below
A matrix A over a field K or, simply, a matrix A (when K is implicit) is a rectangular array of scalars usually presented in the following form:
Platt Scaling (PS) is probably the most prevailing parametric calibration method. It aims to train a sigmoid function to map the original outputs from a classifier to calibrated probabilities.
So its simply is a form of Probability Calibration and is a way of transforming classification output into a probability distribution. For example: If you’ve got the dependent variable as 0 & 1 in the train data set, using this method you can convert it into probability.
Platt Scaling is a parametric method. It was originally built to calibrate the support vector machine…
Original number = x
Log Transformed number x=log(x)
For zeros or negative numbers, we can’t take the log; so we add a constant to each number to make them positive and non-zero.
Each variable x is replaced with log(x), where the base of the log is left up to the analyst. It is considered common to use base 10, base 2 and the natural log ln.
The log transformation, a widely used method to address skewed data, is one of the most popular transformations used in Machine Learning.
One of the main reasons for using a log scale is that…
One of the largest public available data sets with malware can be found in
the Microsoft Malware Classification Challenge. It consists
of over 400 GB of data, with both binary and disassembled code from the
use of the IDA disassembler and debugger.3 The binary malware has been
stripped of the PE-header to be made non-executable for security reasons.
This does limit the value of the data set, but they have prioritized the
potential security implications with hundreds of gigabytes of executable. malware available to anyone. …
Scikit-Learn offers two vehicles for optimizing hyperparameter tuning:
GridSearchCV and RandomizedSearchCV.
GridSearchCV performs an exhaustive search over specified parameter values for an estimator (or machine learning algorithm) and returns the best performing hyperparametric combination.
So, all we need to do is specify the hyperparameters with which we want to
experiment and their range of values, and GridSearchCV performs all possible
combinations of hyperparameter values using cross-validation. As such, we naturally limit our choice of hyperparameters and their range of values. …