Advanced

Data can be imported from different sources:

- Google Sheets
- CSV/ZIP files from your local disk
- CSV/ZIP files from public URL
- SQL Databases (MySQL, MariaDB, Postgress, SQL Server, Google Cloud SQL, Oracle, etc)
- API Rest. You can import from your public API system with different JSON formats. These allows you to import data like Google Analytics, Twitter, Facebook, Dataslayer, etc
- Coming soon BigQuery and many more

**PEARSON CORRELATION**

This matrix will display the correlation between all features. Correlation is a measurement of the relationship between two variables and is used to determine whether one variable can be described by another. It can have values ranging from 0 to 1 (or -1), with zero indicating no correlation between features (are totally different) , and 1 (or -1) indicating that features are identical, allowing us to explain one variable with the other.

**PREDICTION POWER SCORE**

We want to know which features will be more essential for our prediction since we want to estimate our target column using a combination of the remaining features. PPS will inform you which columns are important for this forecast (higher PPS score) and which are not (lower PPS score).

Our data insights system allows you to explore any possible combination between all your features to obtain your desire target.

The system also give you the statistical probability of reaching target values with the selected combination.

To improve your predictions, we recommend you to clean and transform data to remove noise within the dataset.

It´s like having a machine learning expert with you.

**TRAINING ACCURACY**

The accuracy percentage against the training dataset. Typically it is about 80% of your dataset.

**TESTING ACCURACY**

The accuracy percentage against the testing dataset. Typically it is about 20% of your dataset.

It tells us how well the model is at predicting unseen information (for example, the future)

**DATASET BALANCE**

For classification problems this is very usefull to know if our prediction is valid for a prediction enviroment.

**OVERLIFTING**

The system detects if your model is overlifting or not, based on your testing and training accuracy.

**CONCLUSION**

We come to a final conclusion based on the prior results, just like a machine learning expert would.

This calculation is carried out by the algorithm using non-linear combinations of the input features. These combinations are difficult to explain, but we can learn more from this training.

The plot below shows a classification of the features that have had the most influence on the result. In your case, the feature __Age__ has the greatest impact on the result, whereas the feature __Embarked__ has the least. In summary, your features __Embarked__ are not significant for your task and so are not entirely required for training an algorithm.

Once your model is created, and you are happy with the accurady, there are three ways to predict with new data:

- Online
- Google Sheets
- Using our API in third party apps or even using your favourite programming language

Once your model is created in our tool, you can run new predictions by using a new Google Sheets formula **=PREDICT**

This way any time you change new data, this new formula will calculate new predictions in real time.

**SEASONAL ANALYSIS**

The global trend takes the entire time span into account. We can see if the global trend is increasing, decreasing, or if it is divided into segments with differing behavior.

The weekly trend depicts the time series’ weekly average trend, whereas the monthly trend does the same but for the entire month.

This patterns are useful for forecasting because if we wish to predict on a given day of the week and ay of the month, we may infer that the outcome may differ from what we obtain if we predict on another day representing other day of the week or month.

**CRITICAL POINTS**

Our feature time series evolves across time, growing and decreasing alternatively. We’re investigating out whether trend changes follow any patterns, such as happening on the same days of the week or on the same days of the month. We’re also looking for a pattern in the frequency of these shifts.

**PERSISTENCE**

Time series persistence indicates whether the feature is expected to exhibit the same behavior (increasing or decreasing) in the future step, or whether it is likely to change. Persistence is related to what is known as **“time series memory”**.

**CHAOTIC BEHAVIOR**

We use the term **“chaotic”** not in its metaphorical sense, but in its mathematical sense: a time series exhibits chaotic behavior if any small change in the original conditions (in its initial values) can result in a large change in future behavior.

**FINAL EXPLANATION**

To **forecast a time-series** data column we must first find a relationship between the date column and the target column. To overcome this problem, we are employing probabilistic methods. The distinction between frequentist statistical approaches (the most popular approach) and probabilistic machine learning methods is that with frequentist, we choose a model (an equation) and fit data into it. The model’s accuracy will be determined by how well the data matches this equation. A typical example is linear regression, in which the model is a linear equation and the goal is to see if data points fit into a line. **Probabilistic methods** uses Bayesian inference technique. This method takes the data and looks for a model that can explain it. In other words, frequentist analysis is a model-first technique, with the selection of a model being the most crucial stage. Bayesian statistics is a data-first method in which the data is the most essential component. Once we’ve found a model that accurately predicted this data, we can use it to forecast our target feature values in the future. Additionally, we are seeking to get more information, both quantitative and qualitative, from this time series using what our model have learnt from data.

We know that your privacy is an important matther, and we thought about it, and we have created an on-premise solution when you can host your data in your own servers.

So you can take care on your own privacy policy.

We don’t store any of this information in our system. It is securely stored in your Google Account.

Using this solution, no data will be store in our servers.

You can select your own cloud server provider like AWS, Azure, Google Cloud, etc, or even your Bare metal server.