mercredi 2 décembre 2020

How much test quality can be improved?

I am predicting the curve fitting. I tried many machine learning algorithms such as random forest regression, MLP regression, SVR but I am getting the best result with Decision tree regressor. my plots for training and testing are:

Training image

Testing image

My question is, can the testing accuracy be improved more. or is my result is good. I tried to change hyperparameters in each ML algorithms mentioned above. But could not get better result than this. data is fixed I can not have more data. please help.

Here is my code.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
np.random.seed(1337)

data = pd.read_csv('C:/Users/....../..../..../plot.csv',header=None)
x = data.iloc[:,0:4].values
y = data.iloc[:,4:5].values


from sklearn.model_selection import train_test_split
x11, xtt1, y11, yt1 = train_test_split(x, y, test_size=0.2, 
random_state=None, shuffle=False)

y11=y11.reshape(-1,1)
scaler2 = RobustScaler()
y1 = scaler2.fit_transform(y11)
k=x11[:,-1]
kk=xtt1[:,-1]


clf1 = DecisionTreeRegressor(random_state=None, criterion='mse', 
splitter='best', max_depth=None, min_samples_split=2,
                          min_samples_leaf=1, 
min_weight_fraction_leaf=0.0, max_features=None, max_leaf_nodes=None,
                          min_impurity_decrease=0.0, 
min_impurity_split=None, presort='deprecated', ccp_alpha=0.0)

clf1.fit(x11, y1)


test_y = clf1.predict(x11)
test_y=test_y.reshape(-1,1)

Y = scaler2.inverse_transform(test_y)

y11=np.squeeze(y11)
rm=mean_squared_error(y1, test_y, squared=False)
print(rm)
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(k, y11, s=25, c='b', marker="s", label='real')
ax1.scatter(k,Y, s=25, c='r', marker="o", label='NN Prediction')
ax1.set_yscale('log')

plt.legend();
plt.show()

ypredict = clf1.predict(xtt1)
ypredict=ypredict.reshape(-1,1)

Yt = scaler2.inverse_transform(ypredict)

rm=mean_squared_error(yt1, Yt, squared=False)
print(rm)

fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(kk, yt1, s=15, c='b', marker="s", label='real')
ax1.scatter(kk,Yt, s=15, c='r', marker="o", label='NN Prediction')
ax1.set_yscale('log')

plt.legend();
plt.show()

I have four inputs here: I1 input has 50 different values. I2 has 3 different values. I3 has one single value and I4 has two different values. total data points are 250. I used 200 for training and 50 for testing.

plot is between output vs I1 for different values of I2, i3 and I4.

Aucun commentaire:

Enregistrer un commentaire