GForensic - Gary's Digital Forensic and Data Analytics Sharing Blog: February 2016

Saturday, February 27, 2016

Insights from Alan Turing – Father of artificial intelligence

Alan Turing (1912-1954) – a pioneering mathematician who also known as the father of computing and artificial intelligence; and was granted a Royal pardon by the Queen because of his contribution as a code-breaker in World War Two. One of the most famous Alan Turing’s related topics would be “Can machine thinks?”

Just a head up in advance, I am not going to share about who Alan Turing is or how Alan Turing breaks the Enigma ciphers or how the Turning machine works and the following in this post is somehow not Alan Turing related at all. The topic I would like to discuss a bit here is about the topic that just mentioned in the last paragraph on whether a machine thinks and, to certain extent, “Can an analytics platform thinks?”

Many people would ask whether Data Analytics believable which I have mentioned earlier in my old post. My first question is would people think or consider of getting “something” to help in decision making no matter you believe or not believe that “something” would or would not give you a reasonable, thinkable and questionable suggestion produced by acceptable analyzing or thinking process?

A data analytics platform normally works with various statistical mathematical modelling derived and programmed by the data experts and then produce advices accordingly. Question is shall we consider this advice generating process as “machine thinking”?

One idea is that data experts have in fact only programmed the rules into the machine’s memory and then the machine would work itself to produce the results once we request the machine to “think” by pressing the start button. As such, shall we consider what the data experts did as teaching rather than running the machines? Alan Turing predicted that “machine learning” would play an important role of building powerful machines which might teach which I would consider that every one of us who work in IT field are in fact teaching the machine in some circumstances.

However, on the other hand, the machine might only produce the pre-programmed advisory options but not anything out of the box, then is it still consider as “thinking”? But in fact, are we all, as a human, also just producing our personal thoughts based on what our peers, such as teachers, friends or own experiences, teach us or pre-programmed in our brain?

“Sometimes it is the people no one can imagine anything of who do the things no one can imagine.”

- Alan Turing

“We can only see a short distance ahead, but we can see plenty there that needs to be done.”

- Alan Turing

I would leave all the above questions open for discussion and I would hope to see what would happen in the rest of my life. However, it is no doubt for me that I believed Data Analytics is one of the most possible ways to create a better world tomorrow. For example from what I am doing in my job, leveraging Big Data to fight against Money Laundering activities.

Friday, February 19, 2016

Computer Forensic: - Forensic Workflow III & IV – Reporting & Testify as Expert Witnesses

As per what I mentioned in the past about Computer Forensic is mainly about story telling by presenting the fact to facilitate the investigating works and the judgement of the case, reporting would be one of the most critical area that demonstrating the examiners seniority following the analysis skill level. Computer forensic report is usually litigious and likely to be distributed to both technologies technical and non-technical parties. As such, accurately presenting the fact in a human-readable way with no bias would always be the key of writing a good report and, going forward, the following would be some noticeable requirements and pre-concept according to my computer forensic examiner’s experience.

1. Reporting purpose

The ultimate objective of reporting is to present the fact to address the technical concern. This must be presented in the manner of understandable and human-readable. Jargon must be carefully identified assuming that the readers are having zero computer knowledge especially if the report is going to be used in litigations, the report readers would then likely to be non-technical individuals, such as attorneys, judge, jury, etc. Besides, since the report may be the only opportunity to present the facts found in the investigation, this must encompass the whole of any testimony in details for the trier of fact. Otherwise this may induce serious financial and legal consequences due to misrepresent any of the findings.

2. Report structure and style

Ideally all examiner reports are required to be capable in standing on their own and providing the clear and accurate information to anyone, who read the report, to reach the same conclusions. Terms such as “many”, “significantly”, highly”, etc, which are subjective and able to be interpreted in multiple ways must be avoided. Industrial accepted reference should be used whenever possible as to substantiate the statements and the content presented. Also, every single page should contain a unique identifier include the report title, date of issue and also the examiner basic info / company name for references purpose. The more importantly, the examiner’s background are suggested to be clear state and identified at the beginning of the report and the following are the sections that typically included in the examiner reports:-

· Cover page

· Executive summary

· Examiner profile

· Introduction / Background of the case

· Scope of work

· List of supporting documents

· Observations and analyses conducted

· Examiner’s log

· Chain-of-custody records

· Photographs / reference materials

· Disclaimers

· Signature

3. Quality assurance

When the issues are complex, mistake and errors may always be present no matter how careful the examiner is. As such, peer review for me would be suggested as one of the most effective and essential way to resolve these issues. Peer review is to conduct by the one who is at the same level or more senior than you in terms of experience. At least two peers are suggested for you to invite as your peer reviewer. It is not only a general review in terms of grammatical errors or the phases and wordings used, but also a quality assurance on any of the assumptions and analysis made under the report.

The above would be only some basic idea on how a forensic examiner report looks like. In conclusion, here comes the end of the Computer Forensic Workflow overview. In the future computer forensic post, I would try to share some of the real-life examples. Hope all of you found this useful and I would be always happy to discuss if you are interested.

Previous Step

Thursday, February 11, 2016

Computer Forensic: - Forensic Workflow II – Forensic Analysis

Following up from the data acquisition, the next is to conduct actual forensic analysis. There are numbers of analyses available and the most common quick analyses are shared as below.

1. Deletion Analysis

This is one of the most common analyses that required in almost all kind of cases. We could normally achieve this easily by leveraging the forensic software functionalities. Depend on the custodian’s OS version, the data storage device type and the forensic software, the high level results, such as no. of file recovered, could be always different. Also, deletion analysis might not be available in some situations, such as SSD, Linux, etc. On the other hand, deletion analysis would also be available to mobile forensic but it would be subject to the level of data access that available to examiners and the mobile device models.

2. Signature Analysis

One of the most common ways to hide the data files for scanning is to alter its file extension, for example pretending an Excel file to a Text file by changing the extension from xlsx to txt. This would possibly affect the file extraction (if this relies on file type) and the subsequent keyword search process on e-Discovery or any other subsequent forensic data review process. However, the fact is that extension is not the only way to identify the file type. There would be always a file header for each file telling the system that what type of file is it. Signature Analysis is to confirm if the file header / signature tie to the extension and identify the potential real identities.

3. Hash Analysis

Files might be duplicated for backup purpose in general computer usage OR known as no risk since they are system file in fact. In order to identify this, cryptographic hash functions could help. According to Wikipedia, “a cryptographic hash function is a hash function which is considered practically impossible to invert, that is, to recreate the input data from its hash value alone.” MD5 is one the most commonly used hash function for data integrity verification purpose. If two files having the same hash code, then it would be confirmed and accepted to be identical in terms of file content. And for the zero-risk files, we may leverage the information from a project namely National Software Reference Library (NSRL) which provide a Reference Data Set (RDS) of most known and traceable software applications’ files. By comparing the hash with each other and with the NSRL list, the review population would be reduced effectively.

4. Keyword search

There is number of ways to perform analytics on the data acquired and Keyword Search would be known as the most common one. The basic idea is similar to perform search in Google by input the keyword and review the search results accordingly. There would be plenty of ways to run keyword search, such as running in the forensic software or perform file extraction and run Windows search. The most effective, traceable and auditable way is to load the data in scope into the e-Discovery platform for search and review. In terms of loading data for search and filter, ensure that not all data has to be loaded normally since there always exist advanced data analytics and filtering process, such as filter by file type / data, apply analytics on user deletion activities, etc. to trim down the data size for data loading and run the subsequent keyword search to identify the high risk data population for review.

Please note the above would be only a quick overview of the most common task for general investigation purpose. In fact it would be thousands more analysis that available for deep down investigation. I would share more on this in the near future with some real-life example.

Previous Step | Next Step