skip to Main Content

How to prepare data for predictions ?

In order to start predictions first of all customer should upload data to server and train models. Data for predictive algorithms is the most critical part of the analytics. This is what you always train on algorithms it and evaluate it (evaluation is the process of detection best model and precision for model).

During training phase we do partitioning of the data (80%) for training and the small remaining amount (20%) for evaluation.

At the moment Predicty.AI supports Comma-separated values (CSV) format for training.  It’s a delimited data format that has fields/columns separated by the comma character and records/rows terminated by newlines.

CSV can be easily created during export from Microsoft Excel, Databases or other systems (like CRM or analysis tools).

Predictions data flow CSV

Requirements and Privacy

Platform supports CSV data formats for training it will be processed on computation cluster so there’s no actual limit by size, it’s only based to your plan. So here is the list of requirements:

  1. One CSV data-file for training;
  2. CSV should contain header with column names;
  3. Field should contain fact is it active or churned client or CLV amount;
  4. System expects mostly numerical and categorical data columns;
  5. More columns means more precise results;
  6. For good prediction quality data should be presented in at least thousand of records;

During training on data platform not requires any personal data like names, SNN, phone numbers or email addresses and even more – training data in CSV can be fully anonymised. For example product names can be overwritten to your internal codes or numbers, so customer can protect privacy of data.

Data for customer lifetime value predictions

For Customer Lifetime Value we can build training dataset with a large number of data attributes that represent customer activity, interactions and transactions with a brand or product. This dataset also contains a known “customer lifetime value” LTV against each data record.

Required columns: unique identifier of the customer/user, and customer lifetime value – real amount of money that client spent for the period of time.

Other column fields would be: age, purchase amount, first purchase price, gender, age, count and many other metrics the more metrics we include with training set the more quality of the predictions we would get.

Customer Lifetime Value Data Sample

Lifetime Value Predictions data sample (eCommerce transactions)

In the example above ‘total_52_weeks’ field is the fact about historical lifetime values for the past year (52 weeks) or how much money customer spent last year. Customer ID identifies unique. After training models we can build predictions about CLV the new/existing clients.

In other words we would know how much money client will spend during the next year with our new ‘total_52_weeks_predicted’ value.

Data for customer churn (attrition) predictions

For Customer Churn prediction we build training dataset with a large number of telecom customer data.

CSV Sample Customer Churn

Data sample for customer churn prediction (telecom)

Data sample above have above ‘churn’ field is the fact about the customer. If customer active ‘churn’ = 0 if it’s stopped to buy services ‘churn’ = 1 that parameter we’re actually is going to predict in future clients (or existing).

This data represents mobile operator client details with billing facts.

Our prediction column is ‘churn’ and training data should contain both churned and active clients.

Back To Top