Go to file
Angel Valentin 87ea96ec89
Add MIT License to the project
2025-12-14 12:21:06 -05:00
RLBridge adding replay system. just a json file lol 2025-08-08 10:00:22 -04:00
ai adding final updates 2025-09-02 16:41:13 -04:00
.gitignore Have a successful pipe communication between AI and balatro with some intial iteration going 2025-07-19 17:02:15 -04:00
CLAUDE.md used debugger to iron out all the bugs step by step. things are looking way smoother 2025-08-06 20:50:02 -04:00
LICENSE Add MIT License to the project 2025-12-14 12:21:06 -05:00
README.md adding final updates 2025-09-02 16:41:13 -04:00
replays.json adding final updates 2025-09-02 16:41:13 -04:00

README.md

balatro-rl

Reinforcement Learning in Balatro

The goal is the follow:

  • Make a mod for Balatro to read data and do certain actions
  • Have a reinforcement learning ai see that data and perform actions and learn over time

How to set up

I am currenly on Arch Linux, so I had to follow the setup through Proton Currently writing code in a more reliable directory and having it symlink to the correct directory mods folder Made a symlink -> ln -s ~/dev/balatro-rl/RLBridge /mnt/gamerlinuxssd/SteamLibrary/steamapps/compatdata/2379780/pfx/drive_c/users/steamuser/AppData/Roaming/Balatro/Mods/RLBridge

TODO/CHANGELOG

File-based Communication

  • JSON file communication system
  • Lua file writer in mod
  • Game state transmission (hand cards, chips, available actions)
  • Action reception and execution

RL Training

  • Python RL custom environment setup

Game Features

  • Make it so that if we lose, we can restart, or if we win a round and see the "cash out" page, then we also restart. but getting to the "cash out" state should give a ton of reward to incentivize the AI
  • Should we add things that help the AI understand they have only 4 hands and 4 discards to work with? or whatever number it is? I think we should add the hands and discards in the game state as well that would be useful
  • Bug where AI can discard infinite times
  • Should we not give reward for just plain increasing chips? if you think about it, you can play anything and increase chips. Perhpas we just want to get wins of rounds just scoring chips is not enough?. Wonder if the losing penatly is not enough
  • I wonder if there's a problem with the fact that they get points out of every hand played. I feel like it should learn to play more complex hands instead of just getting points even if just one hand scores we should maybe have the rewards reflect that

RL Enhancements

  • Retry Count Penalty: Penalize high retry_count in rewards to discourage invalid actions. Currently retry_count tracks failed action attempts, but we could use this signal to teach the AI which actions are actually valid in each state. Formula: reward -= retry_count * penalty_factor. This would incentivize the AI to learn valid action spaces rather than trial-and-error.
  • I just noticed the RL model scored a 592. It would be amazing to have that saved somewhere.
    • Create replay system that only saves winning games into a file or something (TBD)
    • We would probably store the raw requests and raw responses, and if we win, we can save, if not we can reset the list
    • The idea is that I'll have the seed so I can just look at the actions the requests and responses, plugin the seed manually in the game and play it out myself
    • Add something where we only keep top 5. I don't to have a long log of a bunch of wins
  • Should I reward higher for beating the game in the least amount of hands possible? Notice that if it plays more hands it gets more reward vs if it plays 1 hand even if it's a really good hand
  • on that note should we give more reward for having MONSTER hands. for example they are getting rewards based on blind size, but what if they surpass that by a bunch like get the blind size or greater in one hand? maybe that solves the above problem?
  • Speed up training somehow. Is parallelization possible? Maybe through docker and buying Balatro on multiple steam accounts

DEBUGGING

  • I think it's counting the reset as an episode? review how it calculates episodes for logging in rewards I think something MIGHT be wrong. Also just check it in general because AI wrote it I might need to udpate