week 12(28-01 November)

November 01, 2019

Anomaly Detection

Session with Satnam Singh sir

All the interns had a 2 hour long session with Dr. Satnam Singh,chief data scientist, Acalvio technologies, Bengaluru, India. In the session, he discussed various points and issues related to cyber security and online frauds and he shared some domain knowledge on the related topics and his team's work. It was an interactive session and he also asked about the work interns were doing.

He shared his experience which benefited students and we learned some new approaches and terminology.

Problem statement

Satnam sir shared a kaggle problem and asked all of us to work on it. It was a credit card fraud detection problem and was to be solved as an anomaly detection problem with statistical way without using any libraries such as scikit-learn etc. Sir gave us ample amount of time to work on it before he would review all our progress and code. So after the session, we started exploring different ways to approach this problem.

Solutions

We first solved it as a classification problem, but as it had only two classes and one of them being present more 99% of the time, it was class imbalance problem. We solved it by using techniques such as SWOTE etc. and then took it as a classification problem.It gave alright precision and accuracy.

But when we discussed it with Sarabjot sir, we found out that it was supposed to be approached as a anomaly detection problem and not as a classification one.

So after that we tried to explore approaches used for anomaly detection and fraud detection problems and different students tried different approaches. Some of them were Isolation forest, Multivariate gaussian distribution etc.

The MGD approach yielded about 70% recall and around 80% precision on test data.

Before using any approaches or applying any algorithms, we did some data preprocessing such as data normalization etc.

We also did some exploratory data analysis to find out what the relationship between various data features. We did manual feature selection according to the kde plots of anomalous vs non anomalous data to filter out important features

Summary

The whole process of anomaly detection was spanned over last two weeks of September and we learned about new problem statement and various approaches to solve them.

SVM (Support vector machines)

Alongside anomaly detection problem, we learnt about support vector machines. In a session with Sarabjot sir, he explained the deep theoretical concepts of SVM algorithm and how it works and where it should be used in practical use.

It is a supervised learning algorithm that is used in classification problems.

A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. In two dimentional space this hyperplane is a line dividing a plane in two parts where in each class lay in either side.

We also tried SVM in credit card problem when treating it as a classification problem earlier on. It gave very good recall while precision was poor in that case.

Nature of Intelligence

Meanwhile in September, we were also introduced to another mentor, Danko Nikolic from Germany, who had recorded some videos for us to watch on the topic, nature of intelligence. In this month, we watched lecture number 1 by Danko sir, which talked about Human brain vs machine brain and how much they are same and different.

He talked about history of AI development and research done and books written on intelligence.

He introduced Moravec's paradox in his lecture,which says that

"Things that are easy for biological intelligence are difficult for AI and vice versa."

For biological, perception is easy while for AI, mathematical part is the easy one.

He also talked about 'The law of requisite variety' in his lecture.

Code With Kal