Week 3( 22-26 July)
Third week also gave me chance to know about Visualisation and story telling, i was introduced to practical visualization by Dr.
 Sawinder Pal Kaur, Data Science Expert, SAP Labs India ,in one of her 
sessions where she gave a code walkthrough on Bank Loan defaulter 
detection.The code was implemented in python using its libraries such as
 seaborn, matplotlib, pandas and it was very lucidly explained my mam 
from that i came to know various functionalities using which i can draw 
results from my visualisation.Visualisation and story telling is most 
important part of  data science and it comes before model training and 
feature engineering.After that we were given assignment to choose our 
dataset and do visualisation for practice.I really enjoyed doing 
visualisation and found out some cool insights.
In week 3, I was introduced to new 
algorithm in unsupervised learning that is K-means.I saw the lecture 
video of ISB course understood it,discussed it with my peers and teacher
 and then implemented it in python using jupyter notebook.Later i used 
it to find results in other applications like clustering news articles 
in recommendation system,tf-idf vectorizer etc.To understand it better i
 implemented it in two ways using for-loop and without for-loop.
Along with it i was introduced to visualization which i will be covering in my next blog.
Along with it i was introduced to visualization which i will be covering in my next blog.
K-means
K-means
 clustering is a type of unsupervised learning, which is used when you 
have unlabeled data (i.e., data without defined categories or groups). 
The goal of this algorithm is to find groups in the data, with the 
number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity. The results of the K-means clustering algorithm are:
- The centroids of the K clusters, which can be used to label new data
- Labels for the training data (each data point is assigned to a single cluster)
Business Use
The K-means
 clustering algorithm is used to find groups which have not been 
explicitly labeled in the data. This can be used to confirm business 
assumptions about what types of groups exist or to identify unknown 
groups in complex data sets. Once the algorithm has been run and the 
groups are defined, any new data can be easily assigned to the correct 
group.
This is a versatile algorithm that can be used for any type of grouping. Some examples of use cases are:
- Behavioral segmentation:- Segment by purchase history
- Segment by activities on application, website, or platform
- Define personas based on interests
- Create profiles based on activity monitoring
 
- Inventory categorization:- Group inventory by sales activity
- Group inventory by manufacturing metrics
 
- Sorting sensor measurements:- Detect activity types in motion sensors
- Group images
- Separate audio
- Identify groups in health monitoring
 
- Detecting bots or anomalies:- Separate valid activity groups from bots
- Group valid activity to clean up outlier detection
 


 
Comments
Post a Comment