I have an application for regression spline forecasting. I have split data into train and test but i don't know how to train the model to only use 30% of data as test and the rest as train. So far, the algorithm of regression is using all data for Co2 values from the dataframe group_by_df. I have attached the code: https://pastebin.com/18ZzMxwK
The part that does the fit to the model:
# calculate 25%,50% and 75% percentiles
percentile_25 = np.percentile(group_by_df['day'], 25)
percentile_50 = np.percentile(group_by_df['day'], 50)
percentile_75 = np.percentile(group_by_df['day'], 75)
for count, degree in enumerate([i + 1 for i in range(0, max_grade)]):
# Specifying 3 knots for regression spline
transformed_x1 = dmatrix(
"bs(group_by_df.day, knots=(percentile_25,percentile_50,percentile_75), degree=degree, include_intercept=False)",
{"group_by_df.day": group_by_df.day}, return_type='dataframe')
# build a regular linear model from the splines
fit_spline = sm.GLM(group_by_df[sensor_name], transformed_x1).fit()
# make predictions
pred_spline = fit_spline.predict(transformed_x1)
I have followed this tutorial:http://www.science.smith.edu/~jcrouser/SDS293/labs/2016/lab13/Lab%2013%20-%20Splines%20in%20Python.pdf and applied it to my application, but the tutorial doesn't split data into train and test.
Aucun commentaire:
Enregistrer un commentaire