Week 3( 22-26 July)

July 26, 2019

Third week also gave me chance to know about Visualisation and story telling, i was introduced to practical visualization by Dr. Sawinder Pal Kaur, Data Science Expert, SAP Labs India ,in one of her sessions where she gave a code walkthrough on Bank Loan defaulter detection.The code was implemented in python using its libraries such as seaborn, matplotlib, pandas and it was very lucidly explained my mam from that i came to know various functionalities using which i can draw results from my visualisation.Visualisation and story telling is most important part of data science and it comes before model training and feature engineering.After that we were given assignment to choose our dataset and do visualisation for practice.I really enjoyed doing visualisation and found out some cool insights.

In week 3, I was introduced to new algorithm in unsupervised learning that is K-means.I saw the lecture video of ISB course understood it,discussed it with my peers and teacher and then implemented it in python using jupyter notebook.Later i used it to find results in other applications like clustering news articles in recommendation system,tf-idf vectorizer etc.To understand it better i implemented it in two ways using for-loop and without for-loop.
Along with it i was introduced to visualization which i will be covering in my next blog.

K-means

K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity. The results of the K-means clustering algorithm are:

The centroids of the K clusters, which can be used to label new data
Labels for the training data (each data point is assigned to a single cluster)

Business Use

The K-means clustering algorithm is used to find groups which have not been explicitly labeled in the data. This can be used to confirm business assumptions about what types of groups exist or to identify unknown groups in complex data sets. Once the algorithm has been run and the groups are defined, any new data can be easily assigned to the correct group.

This is a versatile algorithm that can be used for any type of grouping. Some examples of use cases are:

Behavioral segmentation:
- Segment by purchase history
- Segment by activities on application, website, or platform
- Define personas based on interests
- Create profiles based on activity monitoring
Inventory categorization:
- Group inventory by sales activity
- Group inventory by manufacturing metrics
Sorting sensor measurements:
- Detect activity types in motion sensors
- Group images
- Separate audio
- Identify groups in health monitoring
Detecting bots or anomalies:
- Separate valid activity groups from bots
- Group valid activity to clean up outlier detection

Code With Kal

Week 3( 22-26 July)

Comments

Post a Comment

Popular posts from this blog

Linear Regression Numpy code

week 11(21-25 october)

How java is different from c/c++?