I have a problem where I want to model the duration of a task. I have variables that represent the number of actions required (all of them require the same time to be executed) for task completion and the category of the task which is a categorical variable. Additionally, I have turned the categorical variable into dummy variable.
What troubles me is that my output variable is duration and more specifically number of seconds, thus positive and continuous in the [0,+00). What type of regression can I choose for this problem?
A first quick thought was to predict the log(duration) with some method like linear regression, regression decision tree or SVR but then again when exponentiating the results in order to make them interpretable as seconds, I will come up to negative time too.
Note: I would prefer not to mess up with neural nets. I'm sure there is an easier solution.
In case it helps my train data look like this:
+------------+---------+------------+---------+
|dur(sec) |actions |task_catA |task_catB|
+------------+---------+------------+---------+
| 1256 | 257 | 0 | 1 |
| 857.2 | 121 | 1 | 0 |
I use R.
A first quick thought was to predict the log(duration) with some method like linear regression, regression decision tree or SVR but then again when exponentiating the results in order to make them interpretable as seconds, I will come up to negative time too.- Are you really sure of this? How do you get the exponential of a real logarithmic to be negative? $\endgroup$