A Novel Outliers Detection Algorithm For Online Companies Using Deep Neural Network!

Updated: Nov 8

Anomaly detection, or outlier detection, is a well-studied and well-formalized machine learning problem. The algorithm detects observations that raise suspicions by differing significantly from the majority of the data.


Applications: "Anomaly detection is applicable in a variety of domains, such as intrusion detection, fraud detection, fault detection, system health monitoring, event detection in sensor networks, detecting ecosystem disturbances, and defect detection in images using machine vision.[5] It is often used in preprocessing to remove anomalous data from the dataset. In supervised learning, removing the anomalous data from the dataset often results in a statistically significant increase in accuracy" (Anomaly detection - Wikipedia, 2021)

Data collection: We have collected data from one of the leading online companies in its field. A data set that includes 14 years of users data, 7 million users that produced about 26 billion records. The data was connected to external data such as benchmarks, weather, global events, mobility, demographics, holidays, and more.


The model: We're using a deep neural network (DNN) which is an artificial neural network (ANN) with multiple layers (hidden layers) between the input and output layers,[13][2] that consist the following components: neurons, synapses, weights, biases, and functions.[109]. The neural network configuration is determined by the manner in which the units are connected. When the number of hidden layers is higher than or equal to two, the network is called a deep neural network (DNN). The connections of the network in each layer is accomplished by: 1. weighted sum that calculated at each neuron, the output value of each neuron in the proceeding network layer times is the respective weight of the connection with that neuron,

2. Transfer function is applied to this weighted sum to determine the neurons output value.

3. The output value, can be expressed as a function of the input values and network weights.










Results: When we examine the forecast line compared to the actual, there is a high correlation of over 99%. Moreover, according to those results, we see clearly that the model is adapted to time series and knows how to take seasonal data into account with zero shifting.

In a more comprehensive examination of the results and the effect of the features on the model, we clearly see the effect of external data that has been perfected in the model and caused by the features, for example, we've found an effect on the forecast line when there is a holiday or extreme temperature in the area.


Illustration of the analytics-model dashboard


Cases: When we review the dates on which the algorithm detects anomalies we find point events that were not taken into consideration in the data that was imported.

On some dates for example, we found that special football games took place in the same country, also natural disasters, election campaigns, domain issues, etc.'.


India:

United States:

France:

United kingdom:

Value proposition: Identifying the anomalies in this case and in other similar cases, have a tremendous monetary value to the company. According to the data, there is a low importance anomaly once to twice a week and an high importance anomaly every two weeks to a month.

Stopping campaigns: Whenever there is a problem with "welcome flow", the percentage of users who are converted decrease, so real-time alerts have a high monetary value.

For example, stopping the purchase of new traffic will save about $ 17K monthly advertising budget.

Alerting of campaigns with low conversion rates in real time when drop is detected in the entire funnel will save an additional $ 16K monthly.

Customer retention: Detecting retention drop in specific segments and receiving alert of it in real time, allows the company to understand the cause to the drop quickly and act faster to put a stop to it, this action could be save 18K$ monthly.

Troubleshooting: Drops of servers, traffic loads, spike in loading times, release of a version with bugs for production, non-renewal of SSL, stay on the site time drop, download of a significant feature.

Real-time outliner detection allows quick repair with a minimum of disruption to the user's original process, of course fast fault detection will lead to an increase in user retention, usage time and satisfaction level. An alert system that can alert you to such issues will increase the revenue by $ 14K per month.

APIs usage expenses: use of servers, use of content, use of libraries, use of third party resources are charged per use and sometimes due to various reasons there are spikes in use which increases the monthly payment, the faster the organization understands the change the greater the value saved.


*Calculated based on a company with monthly marketing expenses of $ 500K


Authors


Idan Moradov: MSc in Financial Mathematics from Bar-Ilan University. 12 years of experience in online companies as Head of analytics and BI. The founder of YourVoice Music application reached more than 6,000,000 downloads.

Dr. Avraham Morgenstern: Ph.D. in Study Mathematics. Experienced Data Scientist. Built anomaly detection algorithms for years. Worked as Data Science at Anodot.






#AnomalyDetection #MachineLearning #NeuralNetwork #Outliners #Algorithem

677 views0 comments