week 12(28-01 November)
Anomaly Detection
Session with Satnam Singh sir
All the interns had a 2 hour long session with Dr. Satnam Singh,chief
data scientist, Acalvio technologies, Bengaluru, India. In the session,
he discussed various points and issues related to cyber security and
online frauds and he shared some domain knowledge on the related topics
and his team's work. It was an interactive session and he also asked
about the work interns were doing.
He shared his experience which benefited students and we learned some new approaches and terminology.
Problem statement
Satnam sir shared a kaggle problem and
asked all of us to work on it. It was a credit card fraud detection
problem and was to be solved as an anomaly detection problem with
statistical way without using any libraries such as scikit-learn etc.
Sir gave us ample amount of time to work on it before he would review
all our progress and code. So after the session, we started exploring
different ways to approach this problem.
Solutions
We first solved it as a classification problem, but as it had only two
classes and one of them being present more 99% of the time, it was class
imbalance problem. We solved it by using techniques such as SWOTE etc.
and then took it as a classification problem.It gave alright precision
and accuracy.
But when we discussed it with Sarabjot sir, we found out that it was
supposed to be approached as a anomaly detection problem and not as a
classification one.
So after that we tried to explore approaches used for anomaly detection
and fraud detection problems and different students tried different
approaches. Some of them were Isolation forest, Multivariate gaussian
distribution etc.
The MGD approach yielded about 70% recall and around 80% precision on test data.
Before using any approaches or applying any algorithms, we did some data preprocessing such as data normalization etc.
We also did some exploratory data analysis to find out what the
relationship between various data features. We did manual feature
selection according to the kde plots of anomalous vs non anomalous data
to filter out important features
Summary
The whole process of anomaly detection was spanned over last two weeks
of September and we learned about new problem statement and various
approaches to solve them.
SVM (Support vector machines)
Alongside anomaly detection problem, we learnt about support
vector machines. In a session with Sarabjot sir, he explained the deep
theoretical concepts of SVM algorithm and how it works and where it
should be used in practical use.
It is a supervised learning algorithm that is used in classification problems.
A
Support Vector Machine (SVM) is a discriminative classifier formally
defined by a separating hyperplane. In other words, given labeled
training data (supervised learning),
the algorithm outputs an optimal hyperplane which categorizes new
examples. In two dimentional space this hyperplane is a line dividing a
plane in two parts where in each class lay in either side.
We also tried SVM in credit card problem when treating it as a classification problem earlier on. It gave very good recall while precision was poor in that case.
Nature of Intelligence
Meanwhile in September, we were also introduced to another mentor, Danko
Nikolic from Germany, who had recorded some videos for us to watch on
the topic, nature of intelligence. In this month, we watched lecture
number 1 by Danko sir, which talked about Human brain vs machine brain
and how much they are same and different.
He talked about history of AI development and research done and books written on intelligence.
He introduced Moravec's paradox in his lecture,which says that
"Things that are easy for biological intelligence are difficult for AI and vice versa."
For biological, perception is easy while for AI, mathematical part is the easy one.
He also talked about 'The law of requisite variety' in his lecture.
Comments
Post a Comment