KTO: Model Alignment as Prospect Theoretic Optimization Paper โข 2402.01306 โข Published Feb 2 โข 11
Preference Datasets for KTO Collection This collection contains a list of curated preference datasets for KTO fine-tuning for intent alignment of LLMs through signals. โข 5 items โข Updated Mar 19 โข 10
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback Paper โข 2402.01391 โข Published Feb 2 โข 41