Temporal Difference Learning

Temporal Difference Learning

Temporal Difference Learning (also known as TD Learning) is an unsupervised learning technique that is very commonly used in reinforcement learning for the purpose of predicting the total reward expected over the future. Temporal difference learning is a method that is used to compute the long-term utility of a pattern of behavior from a series of intermediate rewards. The temporal difference algorithm always aims to bring the expected prediction and the new prediction together, thus matching expectations with reality and gradually increasing the accuracy of the entire chain of prediction.

What is the benefit of temporal difference learning?

The advantages of temporal difference learning are:

  • TD methods are able to learn in each step, online or offline.

  • These methods are capable of learning from incomplete sequences, which means that they can also be used in continuous problems.

  • Temporal difference learning can function in non-terminating environments.

What are the disadvantages of temporal difference learning?

Temporal Difference Learning has two main disadvantages. They are:

  • Has greater sensitivity towards the initial value.

  • It is a biased estimation.