We consider the problem of simulating diffusion bridges, i.e. diffusion processes that are conditioned to initialize and terminate at two given states. Diffusion bridge simulation has applications in diverse scientific fields and plays a crucial role for statistical inference of discretely-observed diffusions. This is known to be a challenging problem that has received much attention in the last two decades. In this work, we first show that the time-reversed diffusion bridge process can be simulated if one can time-reverse the unconditioned diffusion process. We introduce a variational formulation to learn this time-reversal that relies on a score matching method to circumvent intractability. We then consider another iteration of our proposed methodology to approximate the Doob's h-transform defining the diffusion bridge process. As our approach is generally applicable under mild assumptions on the underlying diffusion process, it can easily be used to improve the proposal bridge process within existing methods and frameworks. We discuss algorithmic considerations and extensions, and present some numerical results.