There are four main steps in creating a Model ,
1. Load Data
2. Clean up data
3. Create the model
4. Train the model
5. Test the Model
6. Deploy the Model
Step 1 - Load the data
To load the data, we use python library called "pandas" mostly
Example of loading data from a CSV file,
import pandas as pd
import matplotlib as plt
#matplotlib innline - this is just for Jupitor Notbooks
filename = 'datafile.csv'
columnames = ['preg', 'pres','skin']
data = pd.load_csv(filename, names = columnnames
Just to check the shape of the data, use,
print(data.shape)
and see the description of the data use
data.describe()
Can just the how data is been structured like this,
print(data.groupby('pres').size())
just the visualize the data,
use, Uni-variate and multi-variate plots
Uni-Variate
We start with some univariate plots, that is, plots of each individual variable.
Given that the input variables are numeric, we can create box and whisker plots of each.
code,
data.plot(kind='box', subplots = True, layout(2,2), sharex = False, sharey=False)
this plots a BOX plot for each numeric data field in the data set.
and view the histagram of the data by,
data.hist()
plt.show()
continuing....
Step 2 - Clean the data
Before start processing data, clean up data is required in machine leaning. We do that in python with two main libraries,
1. numpy
2. pandas
Clean up of data can happen for different ways
1. Dropping unwanted columns from data frame
2. Changing the index of the data frame
3. Tiding up fields in the data
4. Combining str methods to NumPy to clean the columns
5. Cleaning the entire data set with applymap function
6. Renaming columns an Skipping rows
First import two main libraries,
import numpy as np
import pandas as pd
there is a function call "drop" comes with pandas to use for drop data columns from a data frame.
first load the data from a csv file, repeat the code
filename = 'datafile.csv'
columnames = ['preg', 'pres','skin']
data = pd.load_csv(filename, names = columnnames
define the columns to drop from the data frame
drop_columns = ['preg', 'press']
then drop the columns like this,
data.drop(drop_columns, inplace = True, axis = 1)
No comments:
Post a Comment