An AI approach for detecting abnormal churn rate by users segments in online companies.

Online companies invest an enormous amount of capital and resources trying to decrease their users' churn rate, but in order to understand if the churn rate has changed and which group of users caused that change, it is necessary to divide the churn rate by different groups of users such as country, platform (PC, mobile), segments and hundreds of other relevant groups of users.

Source: Analytics-model dashboard. In the example above the users' groups were determined by the year of registration. Each group is represented in a different color. We can see the evaluation of each users' group for their lifetime at a daily resolution.

Today's analysts manually review churn cohort on a weekly or a monthly basis, using a maximum of five verticals, trying to detect abnormal changes in the churn rate. Churn rate review, on a weekly or a monthly basis, that does not take into account all the verticals, is not sufficient. Finding drops or spikes at the right moment and scanning all the verticals, has a tremendous impact on the company.

A single analyst, or even a group of analysts, does not have the ability to go through all the verticals and find all the changes. Analysts are people, and therefore they are more likely to make mistakes. A machine using today's technology is able to achieve in seconds or minutes what could take the average analyst weeks or even months.

This is why we believe this work should be done by AI, and not human employees.

Source: Analytics-model dashboard. Yearly cohort: shows each year how many users survive to the next year.

Data sources: We have collected data from one of the leading online companies in its field. A data set that includes 14 years of users data, 7 million users that produced about 26 billion records. The data was connected to external data such as benchmarks, weather, global events, mobility, demographics, holidays, and more.

Data preparation: The data comes from two different sources, one of them is the platform itself, where the raw data comes as-is and does not go through any process. <UserID, Registration_Date, Event_Date, Event_type > The other one comes from the server-side and goes through ETLs processes, user aggregation processes, and feature engineering. The bottom line, for every user, we generate an aggregation line that shows his attributions. for example <UserID, Registration_Date,User_type, User_segment,Country, Platform, Behavior_type ... >

Through uniform distribution and use of K-means we created groups of users and for each group of users we calculated the churn rate in daily, hourly resolutions (depending on the volume of data in each group of users).

Source: Analytics-model dashboard.

The model: We're using a deep neural network (DNN) which is an artificial neural network (ANN) with multiple layers (hidden layers) between the input and output layers,[13][2] that consist of the following components: neurons, synapses, weights, biases, and functions.[109]. The development of the model relies on old data, and at the end of the process, we got a DNN object.

This model receives requests that contain the features themselves and returns an answer in real-time,

The answer allows us to know if it is an anomaly and the strength of the anomaly. This allows us to give it a priority and notify it to the relevent employee.


Idan Moradov: MSc in Financial Mathematics from Bar-Ilan University. 12 years of experience in online companies as Head of analytics and BI. The founder of YourVoice Music application reached more than 6,000,000 downloads.

Dr. Avraham Morgenstern: Ph.D. in Study Mathematics. Experienced Data Scientist. Built anomaly detection algorithms for years. Worked as Data Science at Anodot.

85 views0 comments