In this paper, we study continuous time Markov decision processes (CTMDPs) with a denumerable state space, a Borel action space, unbounded transition rates and nonnegative reward function. The optimality criterion to be considered is the first passage risk probability criterion. To ensure the non-explosion of the state processes, we first introduce a so-called drift condition, which is weaker than the well known regular condition for semi-Markov decision processes (SMDPs). Furthermore, under some suitable conditions, by value iteration recursive approximation technique, we establish the optimality equation, obtain the uniqueness of the value function and the existence of optimal policies. Finally, two examples are used to illustrate our results.
It has been known for a long time that for birth-and-death processes started in zero the first passage time of a given level is distributed as a sum of independent exponentially distributed random variables, the parameters of which are the negatives of the eigenvalues of the stopped process. Recently, Diaconis and Miclo have given a probabilistic proof of this fact by constructing a coupling between a general birth-and-death process and a process whose birth rates are the negatives of the eigenvalues, ordered from high to low, and whose death rates are zero, in such a way that the latter process is always ahead of the former, and both arrive at the same time at the given level. In this note, we extend their methods by constructing a third process, whose birth rates are the negatives of the eigenvalues ordered from low to high and whose death rates are zero, which always lags behind the original process and also arrives at the same time.
This paper deals with a first passage mean-variance problem for semi-Markov decision processes in Borel spaces. The goal is to minimize the variance of a total discounted reward up to the system's first entry to some target set, where the optimization is over a class of policies with a prescribed expected first passage reward. The reward rates are assumed to be possibly unbounded, while the discount factor may vary with states of the system and controls. We first develop some suitable conditions for the existence of first passage mean-variance optimal policies and provide a policy improvement algorithm for computing an optimal policy. Then, two examples are included to illustrate our results. At last, we show how the results here are reduced to the cases of discrete-time Markov decision processes and continuous-time Markov decision processes.