Wednesday, April 3, 2019

Machine Learning Notes 02

Very first thing you create with a machine learning program is a Data Model, which can use in business applications to do the predictions, forecasting and etc. 

There are four main steps in creating a Model ,

1. Load Data
2. Clean up data
3. Create the model
4. Train the model
5. Test the Model
6. Deploy the Model 

Step 1 - Load the data

To load the data, we use python library called "pandas" mostly

Example of loading data from a CSV file,

import pandas as pd
import matplotlib as plt
#matplotlib innline - this is just for Jupitor Notbooks
filename = 'datafile.csv'
columnames = ['preg', 'pres','skin']
data = pd.load_csv(filename, names = columnnames


Just to check the shape of the data, use,

print(data.shape)

and see the description of the data use

data.describe()

Can just the how data is been structured like this,

print(data.groupby('pres').size())

just the visualize the data,

use, Uni-variate and multi-variate plots

Uni-Variate
We start with some univariate plots, that is, plots of each individual variable.
Given that the input variables are numeric, we can create box and whisker plots of each.
code,
data.plot(kind='box', subplots = True, layout(2,2), sharex = False, sharey=False)
this plots a BOX plot for each numeric data field in the data set.
and view the histagram of the data by,

data.hist()
plt.show()
continuing....
Step 2 - Clean the data 

Before start processing data, clean up data is required in machine leaning. We do that in python with two main libraries,

1. numpy
2. pandas

Clean up of data can happen for different ways

1. Dropping unwanted columns from data frame
2. Changing the index of the data frame
3. Tiding up fields in the data
4. Combining str methods to NumPy to clean the columns
5. Cleaning the entire data set with applymap function
6. Renaming columns an Skipping rows


First import two main libraries,

import numpy as np
import pandas as pd

there is a function call "drop" comes with pandas to use for drop data columns from a data frame. 

first load the data from a csv file, repeat the code

filename = 'datafile.csv'
columnames = ['preg', 'pres','skin']
data = pd.load_csv(filename, names = columnnames

define the columns to drop from the data frame

drop_columns = ['preg', 'press']

then drop the columns like this,

data.drop(drop_columns, inplace = True, axis = 1)


No comments:

Post a Comment