![]() |
Analytics Cycle |
Everything has a process and so does Data Analytics. This process I would call it as Analytics
Cycle. The Analytics Cycle is a looping
process, included 5 steps as below:
- Extract Transform and Load (ETL);
- Analytics Rules;
- Case Management;
- Analytics Review;
- Optimization;
And today I would discuss about step 1:- Extract,
Transform and Load (ETL). The ultimate
goal here is to identify and consolidate the data into an analyzable format
namely a Common Data Model. For example,
a people database contains people information where this could be applied for
customer analytics, HR analytics, relationship analytics, etc; OR a census
database consolidates the government census data; etc.
After identifying the Common Data Model needed, we
would then identify the usable data source and to Extract, Transform and Load this
into the Common Data Model. Lucky thing
is that data storage is normally not an issue given that the physical size of
digital storage are becoming smaller and smaller and is available to be
purchased anywhere anytime today excluding any of the business budgeting
concern (if any).
Assuming the infrastructure is ready, most of the time
the first conflict between data expert and business management are between the
data structure optimization and the project timeline limitation given that the
best effective analyzable data structure does need times by performing data
normalization. People usually have no
patient in waiting for data cleansing in the reality and hoping in quick wins
with data. How to balance the budget,
timeline and technical works would be about where the data adviser value is.
On the other hand, most of the data gathering works
are robotic indeed, such as data archive extraction, data crawling from
internet, etc. Thus leveraging the scripting
technique, such as VBScript, VBA in MS office, etc. to automate the works and
let the computer works for you would definitely be one of the good options to speed
up the robotic process in a effective manner.
Also, another advantages is that automated process could run with proper
audit trial which would help to keep track of the works and to provide proof of
works completion assuming no critical bugs in your script.
Finally, all we need to do is to load the data into
the database engine and to ensure all the data loading are being reconciled and
verified with the concept of control total and data hash whichever appropriate. Also, the more importantly, we should always
pay high attention on the data with the high risk data types, for example the Datetime
format and the Text format with character length limit, etc.
“Garbage in, garbage out”, ETL is the fundamental
of successful data analytics works. Without a good plan of ETL with proper analyzable data structure and a meaningful data set towards your analytics idea,
it will most likely be wasting of time.
No comments:
Post a Comment