Блог им. ipsnow
An additional problem with this is that they use A3C here for trading. A3C is known to not be suitable for adversarial environments (e.g. board games, like Chess). I wrote a paper that demonstrated that A3C is as exploitable as a uniform random strategy in board games (specifically, some poker variants): arxiv.org/abs/2004.09677
It’s mostly an issue that A2C isn’t designed for adversarial environments. It also doesn’t have any notion of hidden information, while other algorithms (eg CFR) explicitly handle this. There’s a well-known phenomena of cycling, where agent A will beat agent B which beats agent C which beats agent A; A2C can exhibit this. Think of rock/paper/scissors- AlwaysRock beats AlwaysScissors which beats AlwaysPaper. To avoid this, you typically need to do some sort of averaging.
link