Regardless all the data related fancy wordings, one
of the most common goals for any data experts would always be “identifying data
outliers”. There existed plenty of ways
to help us in identifying the outliers by leveraging both the internal and
external data and today I am going to share some of the most common ways to
achieve this goal.
In statistic, data outlier is known as a data point
that stand apart from the major data set.
A data outlier might sometime because of data error but, in most circumstances,
that’s something that we should aware and take special notice on that. The following is a common two steps approach
from both the quantitative and qualitative perspective to identify the data
outliers: -
1.
Statistical approach
This is under the quantitative approach that we
could leverage some common statistical concept as to identify the data outliers
with, but not limited to, Mean, Standard Deviation,
etc regardless the actual industrial knowledge.
This could also include some complex statistic model, such as those
probability theories, but today I would just discuss about one of the most
common ideas as to identify the data points which stand out of Mean plus or minus X times Standard
Deviation.
Example on customers’ historical spending amounts
Assuming all these customers spending power are the
same and we are now undergoing a new membership scheme. We could calculate the Mean plus and minus 2
Standard Deviation on spending amounts as the threshold 1 and 2
respectively. So that: -
- Whoever have a total spending above threshold 1 might potentially consider as the most valuable customers the top level premium VIP card should be given aim at retain them to spend with us and show our appreciation; and
- Whoever stand between threshold 1 and 2 might be consider as normal customers that normal membership should be given as to encourage them to spend more as to reach the top level premium VIP class; and
- For the rest that below threshold 2, no membership might be grant but we might enhance something like the Points Collection Scheme on spending basis as to encourage this group of customers to spend more as to become our member and earn the subsequent benefits.
2.
Detection scenario logic
Depends on the business nature, certain qualitative
approach on detection scenario logics with different available data element
could be applied, such as age, gender, profession, etc.
Example on customers data follow up on the above
example
Assuming the target customer is on high income
group who could be classified on age group and professions, we could then further
categorize the customers in additional to the spending amounts as refer to the
previous example. One point to note here is that in order to identify the right
scenario logic, we might consider to leverage both internal and external data
source. For example, government census
data telling which professions and what age group would be the high income
people in this case. And after this, we might then come up with some logics
like: -
- Top premium VIP might only be granted to those will high spending amounts, age above 40, professions in certain industries, etc. as the current customer base; and
- A 2nd grade premium VIP class to those with normal spending amounts, age
between 20 to 40 and profession in certain industries, etc. who potentially
will be the successors of the current top level people and we should treat this
group as the highly potential customers base.
No comments:
Post a Comment