1
$\begingroup$

I have a problem where I want to model the duration of a task. I have variables that represent the number of actions required (all of them require the same time to be executed) for task completion and the category of the task which is a categorical variable. Additionally, I have turned the categorical variable into dummy variable.

What troubles me is that my output variable is duration and more specifically number of seconds, thus positive and continuous in the [0,+00). What type of regression can I choose for this problem?

A first quick thought was to predict the log(duration) with some method like linear regression, regression decision tree or SVR but then again when exponentiating the results in order to make them interpretable as seconds, I will come up to negative time too.

Note: I would prefer not to mess up with neural nets. I'm sure there is an easier solution.

In case it helps my train data look like this:

+------------+---------+------------+---------+
|dur(sec)    |actions  |task_catA   |task_catB|
+------------+---------+------------+---------+
| 1256       | 257     | 0          | 1       |
| 857.2      | 121     | 1          | 0       |

I use R.

$\endgroup$
2
  • 2
    $\begingroup$ A first quick thought was to predict the log(duration) with some method like linear regression, regression decision tree or SVR but then again when exponentiating the results in order to make them interpretable as seconds, I will come up to negative time too. - Are you really sure of this? How do you get the exponential of a real logarithmic to be negative? $\endgroup$ Commented Nov 23, 2017 at 9:54
  • 1
    $\begingroup$ The exponential distribution is often used to model waiting times and may make sense here. Look at exponential regressions. $\endgroup$ Commented Nov 23, 2017 at 11:10

1 Answer 1

2
$\begingroup$

Your problem doesn't really need complex Machine Learning algorithms. I think one good option is Stephan's sugestion, a Exponential Regression.

This means you assume your dependent variable follows a exponential distribution, with p.d.f: $$f(x) = \lambda \exp(-\lambda x)\ I_{(0, \infty)}(x)$$

where $\lambda$ is the occurrence rate per unit of measurement, which can be time, distance, etc.

$\endgroup$
1
  • $\begingroup$ That is a special case of a generalized linear model with a Gamma family. $\endgroup$ Commented Aug 6, 2019 at 21:31

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.