Week 3( 22-26 July)
Third week also gave me chance to know about Visualisation and story telling, i was introduced to practical visualization by Dr.
Sawinder Pal Kaur, Data Science Expert, SAP Labs India ,in one of her
sessions where she gave a code walkthrough on Bank Loan defaulter
detection.The code was implemented in python using its libraries such as
seaborn, matplotlib, pandas and it was very lucidly explained my mam
from that i came to know various functionalities using which i can draw
results from my visualisation.Visualisation and story telling is most
important part of data science and it comes before model training and
feature engineering.After that we were given assignment to choose our
dataset and do visualisation for practice.I really enjoyed doing
visualisation and found out some cool insights.
In week 3, I was introduced to new
algorithm in unsupervised learning that is K-means.I saw the lecture
video of ISB course understood it,discussed it with my peers and teacher
and then implemented it in python using jupyter notebook.Later i used
it to find results in other applications like clustering news articles
in recommendation system,tf-idf vectorizer etc.To understand it better i
implemented it in two ways using for-loop and without for-loop.
Along with it i was introduced to visualization which i will be covering in my next blog.
Along with it i was introduced to visualization which i will be covering in my next blog.
K-means
K-means
clustering is a type of unsupervised learning, which is used when you
have unlabeled data (i.e., data without defined categories or groups).
The goal of this algorithm is to find groups in the data, with the
number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity. The results of the K-means clustering algorithm are:
- The centroids of the K clusters, which can be used to label new data
- Labels for the training data (each data point is assigned to a single cluster)
Business Use
The K-means
clustering algorithm is used to find groups which have not been
explicitly labeled in the data. This can be used to confirm business
assumptions about what types of groups exist or to identify unknown
groups in complex data sets. Once the algorithm has been run and the
groups are defined, any new data can be easily assigned to the correct
group.
This is a versatile algorithm that can be used for any type of grouping. Some examples of use cases are:
- Behavioral segmentation:
- Segment by purchase history
- Segment by activities on application, website, or platform
- Define personas based on interests
- Create profiles based on activity monitoring
- Inventory categorization:
- Group inventory by sales activity
- Group inventory by manufacturing metrics
- Sorting sensor measurements:
- Detect activity types in motion sensors
- Group images
- Separate audio
- Identify groups in health monitoring
- Detecting bots or anomalies:
- Separate valid activity groups from bots
- Group valid activity to clean up outlier detection
Comments
Post a Comment