Posts

Showing posts from July, 2019

Week 3( 22-26 July)

Image
Third week also gave me chance to know about Visualisation and story telling, i was introduced to practical visualization by   Dr. Sawinder Pal Kaur, Data Science Expert, SAP Labs India ,in one of her sessions where she gave a code walkthrough on Bank Loan defaulter detection.The code was implemented in python using its libraries such as seaborn, matplotlib, pandas and it was very lucidly explained my mam from that i came to know various functionalities using which i can draw results from my visualisation.Visualisation and story telling is most important part of  data science and it comes before model training and feature engineering.After that we were given assignment to choose our dataset and do visualisation for practice.I really enjoyed doing visualisation and found out some cool insights. In week 3, I was introduced to new algorithm in unsupervised learning that is K-means.I saw the lecture video of ISB course understood it,discussed it with my peers and teache

Week 2 (15-19 july)

                                  Day1 In today's udemy course we learned about following topics: Methods and functions *args and **kwargs lambda expressions , maps and filters Along with this we solved few coding excercises related to this and decided to work on udemy course milestone project-1.                                                  Day2 Today, firstly we discussed about the milestone project and sorted out the problems related to it.After that we continued with our python course and learnt about object oriented programming in python which includes following topics: Class object attributes and methods Inheritance and polymorphism Special(Magic/Dunder) Methods and solved some homework excercises related to it.                                                 Day3 Today we started with the scikit-learn implementation of linear regression model and compared the results of this model with previously implemented code of linear regression from

Linear Regression Numpy Code and Python

Python  In the course today we learned about the following concepts: Lists  Dictionaries Tuples Sets Booleans Dealing with Files in Python Iterating in a file Linear Regression Numpy Code The code was completed and it was: import math beta = beta_zero cost_diff = 100 rmse =-1 for i in range(10000):     old_rmse = rmse     y_hatnew = x_data.dot(beta)              y_diff =y_true.reshape(len(x_inputs),1) - y_hatnew              rmse = math.sqrt(y_diff.T.dot(y_diff)/x_data.shape[0])     print(i,":",rmse)              if abs(rmse-old_rmse) < 0.000000000001:         break          derivative = 2*y_diff.T.dot(x_data)/x_data.shape[0]     beta = beta+step*derivative.T print(beta)  The next task given to us was to implement the sklearn function of Linear Regression and to compare the results of our version and the sklearn function.

Linear Regression Numpy code

 We finished coding generating data for the numpy version of Linear regression. We didn't use data from an excel sheet or a Kaggle dataset and hence we had to create our own data. For this, we created random integer data for our X and betas. Then we created a noise, as real data always has noise, using this we created Y data. the code for the same was as follows: import numpy as np samplesize=1000 num_attrs= 3 step = 0.1 x_inputs = np.random.rand(samplesize,num_attrs-1) x0 = np.ones((samplesize,1)) x_data = np.concatenate((x0, x_inputs), axis=1) noise = np.random.randn(len(x_inputs),1)  betas = np.random.rand(num_attrs,1) y_true = x_data.dot(betas) + noise  #understand this y_true.reshape(1000,1) Python course  We started an Udemy course on Python. The concepts we covered today were: Pros and Cons of Dynamic Typing String Indexing and Slicing Various String Methods String Interpolation:            a) format()           b) Float formatting

Introduction to Logistic regression

 While we continued to write the numpy code for linear regression we were introduced to Logistic Regression. It  i s a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes). It is used to predict a binary outcome (1 / 0, Yes / No, True / False) given a set of independent variables. To represent the binary/categorical outcome, we use dummy variables. You can also think of logistic regression as a special case of linear regression when the outcome variable is categorical, where we are using log of odds as the dependent variable. In simple words, it predicts the probability of occurrence of an event by fitting data to a  logit  function. It required us to understand sigmoid function. I n order to map predicted values to probabilities, we use the   sigmoid   function. The function maps any real value into a

Introduction to Linear Algebra

Simple linear regression is useful for finding a relationship between two continuous variables. One is a predictor or independent variable and other is a response or dependent variable. It looks for a statistical relationship but not a deterministic relationship. The relationship between the two variables is said to be deterministic if one variable can be accurately expressed by the other. For example, using temperature in degree Celsius it is possible to accurately predict Fahrenheit. Statistical relationship is not accurate in determining the relationship between two variables. For example, the relationship between height and weight. With simple linear regression we want to model our data as follows: y = B0 + B1 * x This is a line where y is the output variable we want to predict, x is the input variable we know and B0 and B1 are coefficients we need to estimate. It also required us to understand the concept of gradient descent.  Gradient Descent is the process of minim

Numpy Revision

 We made a python program for the following random walk question: Say you are standing at the bottom of a staircase with a dice. With each throw of the dice, you either move down one step (if you get a 1 or 2 on the dice) or move up one step (if you get a 3, 4 or 5 on the dice). If you throw a 6 on the dice, you throw the dice again and move up the staircase by the number you get on that second throw. Note if you are on the base of the staircase you cannot move down! What is the probability that you will reach more than 60 steps after 250 throws of the dice?   This question required the knowledge of np.random.seed. It s eeds the generator.  It makes the random numbers predictable. When the value is reset, the same numbers will appear every time. If a  seed is  not assigned , NumPy automatically selects a random seed value based on the system's random number generator device or on the clock   The code was as follows: import numpy as np np.random.seed(123)