Import Data

Data can be imported from different sources:

  1. Google Sheets
  2. CSV/ZIP files from your local disk
  3. CSV/ZIP files from public URL
  4. SQL Databases (MySQL, MariaDB, Postgress, SQL Server, Google Cloud SQL, Oracle, etc)
  5. API Rest. You can import from your public API system with different JSON formats. These allows you to import data like Google Analytics, Twitter, Facebook, Dataslayer, etc
  6. Coming soon BigQuery and many more

Review and Transform Data

In a Kaggle survey, it is said that one of the most important steps in the process of creating a machine learning model is preprocesing your data. To do so, the system allows you to duplicate columns, apply modification to your features, remove rows by different conditions, remove outlier values, apply logarithmical or other functions transformations. These helps you to create the best possible model with your data.

Exploratory Analysis


This matrix will display the correlation between all features. Correlation is a measurement of the relationship between two variables and is used to determine whether one variable can be described by another. It can have values ranging from 0 to 1 (or -1), with zero indicating no correlation between features (are totally different) , and 1 (or -1) indicating that features are identical, allowing us to explain one variable with the other.


We want to know which features will be more essential for our prediction since we want to estimate our target column using a combination of the remaining features. PPS will inform you which columns are important for this forecast (higher PPS score) and which are not (lower PPS score).

Data Insights

Our data insights system allows you to explore any possible combination between all your features to obtain your desire target.

The system also give you the statistical probability of reaching target values with the selected combination. 


To discover clusters, we want to look for sub-groups of points with a greater density and assign them to a cluster. Imagine a topographic map where you can see some mountains and valleys. These high density points are like the mountains and the rest of points in the dataset are like the valleys. We are interested in finding the “mountains” in your dataset. The first thing you have to do is to say what is a “mountain” because you can find some “mounds” that you don’t want to consider as “mountains”. These “mounds” are like noise that has to be cleaned. Once you have removed the noise, to define the “mountains” you want to get in your analysis, you will fix a minimum height. Below this height, “mountains” are not interesting for you and you don’t want to consider them as “clusters”. So, to make a clustering of your dataset you have to choose two basic parameters: number of neighbors (remove the noise) and the cluster size (like the height in your mountain). Selecting these two parameters you can control if you find more or less clusters, it is, if you make more or less divisions in you dataset.

Experts recommendations

To improve your predictions, we recommend you to clean and transform data to remove noise within the dataset.


It´s like having a machine learning expert with you.

Model evaluation


The accuracy percentage against the training dataset. Typically it is about 80% of your dataset.  


The accuracy percentage against the testing dataset. Typically it is about 20% of your dataset. 

It tells us how well the model is at predicting unseen information (for example, the future)


For classification problems this is very usefull to know if our prediction is valid for a prediction enviroment.


The system detects if your model is overlifting or not, based on your testing and training accuracy.


We come to a final conclusion based on the prior results, just like a machine learning expert would.

Advanced Metrics 

CONFUSION MATRIX A confusion matrix summarises how successful a classification model’s predictions were. The matrix rows are ground truth labels and the columns are labels predicted by the model. Scan each row to determine where misclassifications occur for a given label. For binary classification problems the values in each cell are percentages represented from zero to one (the sum of each column must be ≈ 1). In multiclass each cell has the total number of predictions in the given case. RECEIVER OPERATING CHARACTERISTIC A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. PRECISION AND RECALL Precision (also called positive predictive value) is the fraction of predictions of the relevant class that were correct. Recall (also known as sensitivity) is the fraction of observations of the relevant class that were predicted correctly.

Feature importance

This calculation is carried out by the algorithm using non-linear combinations of the input features. These combinations are difficult to explain, but we can learn more from this training.

The plot below shows a classification of the features that have had the most influence on the result. In your case, the feature Age has the greatest impact on the result, whereas the feature Embarked has the least. In summary, your features Embarked are not significant for your task and so are not entirely required for training an algorithm.

Final Model Pipeline

The term “pipeline” refers to a collection of processes. A pipeline combines different algorithm runs to improve prediction accuracy. This is referred to as an ensemble of algorithms, because each one only predicts a fraction of the points in the dataset. It works like this: the first algorithm makes a prediction and can predict a specific number of target points. If this value is low (for example, only predicts 60% of target points), the ensemble searches for another algorithm capable of predicting some of the 40% of unpredicted points. Now, when both algorithms are combined, they can predict up to 80% of target points.To increase this quantity, we add a third algorithm and see if it would predict the 20% of points that are still unexplained. We continue to add algorithms until we can predict more than 90% of the target points. This approach isn’t always possible because we can’t always discover algorithms that can predict the unexpected points. In this case, we must terminate the process and accept a low-accuracy prediction as the final result.

Dimensionality Reduction

You have a dataset with a high number of dimensions (each feature is a dimension). If your dataset just has two dimensions, you can represent it in a two-dimensional plot and look for patterns. You’ll see if there’s a clear relationship between the two features or if there isn’t one at all. But how can you express a multi-dimensional dataset? We can’t, for example, make a 10-dimensional plot. Our only options are 2D charts (two axes) or 3D graphs (three axes). Data Scientists use dimensionality reduction to explore high-dimensional data. Specific algorithms can reduce an N-feature dataset to a 2D or 3D dataset. These algorithms work by constructing two-dimensional (or three-dimensional) “maps” using data that can contain hundreds or even thousands of features. As a result of these calculations, we will see in the plot groups of points. Points in a group are closer to each other than to points outside the group. Now we color points with the target feature value. For example, if we perform a binary classification, we will get points with two colors. If it is multiclass with four classes, we will have four colors. And if we have a regression, we will have a color scale between maximum and minimum feature’s value.

Features Similarity

We will apply an algorithm to calculate the degree of similarity between each pair of data points in our data set. With these similarity values, we will divide our features into multiple groups and create a graphic representation of the features with the most comparable data points. We will also connect these feature groups to other groups using the same approach. This approach yields a binary tree or dendrogram with N–1 nodes. The branches of this tree are cut at a level where there is a lot of “space” to cut them, resulting in a big leap in levels between two successive groups.

Feature Influence

The algorithm has discovered a solution to combine your features in order to obtain the desired value. As previously stated, this combination is difficult to explain. However, it is also important to understand how features have influenced algorithm conclusions. Some features are not important to the algorithm. One method for determining if a feature is relevant is to remove it and rerun the calculation, comparing the accuracy scores before and after removal. If the scores do not change, the particular feature was not important in solving the problem. However, if there is a big change, we can be certain that this functionality is required to solve the problem.

Partial Dependence

Firedman’s Partial Dependence Plots (PDP) of a set of features shows the response of the algorithm output for each possible value of the feature. The algorithm may consider only a fraction of the feature values to really be valuable to obtain a certain target range. Identifying these ranges is key since we can identify them as “critical” for your problem. Out of this range, features are either less important or not at all. However, focusing on the range with higher importance is a key observation. If this critical range is frequent in a variable, that variable will be important when making decisions. Because it can be used with a variety of supervised learning algorithms, this tool is defined as model agnostic [1]. Many supervised learning models use PDP to better understand these models.

Data Outliers

Outliers are points that are not similar to the rest of the points in a dataset. They are considered as ‘rare’ points. We can see that these points are “separated” from the rest of the data when we reduce the dimensionality of our dataset. In this section, we will look for outliers and display them in a two-dimensional plot and then in a list. Checking them is interesting since it will give us insight into “what is normal and what can’t be considered as normal” in our dataset. Anomaly detection (a popular term for outlier identification) is a common requirement in data analysis. Identifying anomalies is crucial in many industries, including industrial applications for maintenance, the automotive sector, for the financial industry to understand potential risks, or for marketing departments to capture emerging trends.


Once your model is created, and you are happy with the accurady, there are three ways to predict with new data:

  1. Online
  2. Google Sheets
  3. Using our API in third party apps or even using your favourite programming language

What if tool

This “What if” tool provides this additional information to help understand why the algorithm arrived at a particular conclusion, and hence what can be done to rectify the situation. The main difference between the Explainability page and the “What if” tool is that the “What if” tool provides a practical knowledge of the algorithm result. This tool has two steps: the first is to make a forecast using the model based on some input variables. After we have completed our prediction, we will seek to reverse the algorithm output by modifying some feature inputs. With this basic methodology, we will find which features have a strong influence on the algorithm, either alone or in combination with other features.

Target Optimization

We can see on the Target Prediction page how we can use our model to predict target values based on different input variable values. Our ML model can be defined as a method for estimating our target from our input variables. As a result, by changing the values of these input variables, we can estimate the value of our target. But what if we want to estimate our target but don’t have specific values for these input variables? For example, suppose we don’t have a concrete value for variable 1 and all we know is that it falls within a certain range of values. Then we would have as much target values as values in the variable 1 input range. Let’s try it the other way around: in the input variables, we only have value ranges. So, if we want to get a target value from a set of input variables, we can change the value of our target variable. This value can be a specific value or the maximum/minimum value that can be defined. This is known as an optimization problem.

Formula Predictions

Once your model is created in our tool, you can run new predictions by using a new Google Sheets formula =PREDICT

This way any time you change new data, this new formula will calculate new predictions in real time.



The global trend takes the entire time span into account. We can see if the global trend is increasing, decreasing, or if it is divided into segments with differing behavior.

The weekly trend depicts the time series’ weekly average trend, whereas the monthly trend does the same but for the entire month.

This patterns are useful for forecasting because if we wish to predict on a given day of the week and ay of the month, we may infer that the outcome may differ from what we obtain if we predict on another day representing other day of the week or month.



Our feature time series evolves across time, growing and decreasing alternatively. We’re investigating out whether trend changes follow any patterns, such as happening on the same days of the week or on the same days of the month. We’re also looking for a pattern in the frequency of these shifts.



Time series persistence indicates whether the feature is expected to exhibit the same behavior (increasing or decreasing) in the future step, or whether it is likely to change. Persistence is related to what is known as “time series memory”.



We use the term “chaotic” not in its metaphorical sense, but in its mathematical sense: a time series exhibits chaotic behavior if any small change in the original conditions (in its initial values) can result in a large change in future behavior.



To forecast a time-series data column we must first find a relationship between the date column and the target column. To overcome this problem, we are employing probabilistic methods. The distinction between frequentist statistical approaches (the most popular approach) and probabilistic machine learning methods is that with frequentist, we choose a model (an equation) and fit data into it. The model’s accuracy will be determined by how well the data matches this equation. A typical example is linear regression, in which the model is a linear equation and the goal is to see if data points fit into a line. Probabilistic methods uses Bayesian inference technique. This method takes the data and looks for a model that can explain it. In other words, frequentist analysis is a model-first technique, with the selection of a model being the most crucial stage. Bayesian statistics is a data-first method in which the data is the most essential component. Once we’ve found a model that accurately predicted this data, we can use it to forecast our target feature values in the future. Additionally, we are seeking to get more information, both quantitative and qualitative, from this time series using what our model have learnt from data.

On-premise Solution

We know that your privacy is an important matther, and we thought about it, and we have created an on-premise solution when you can host your data in your own servers.

So you can take care on your own privacy policy.

We don’t store any of this information in our system. It is securely stored in your Google Account.

Using this solution, no data will be store in our servers.

You can select your own cloud server provider like AWS, Azure, Google Cloud, etc, or even your Bare metal server.