AI & ML interests
DeepRL, RL finetuning
Organizations
skandermoalla/qrpo-paper-llama-sft-leetcode-sandbox-temp1-ref50-offpolicy10random-sandbox
Viewer
• Updated • 27k • 18
skandermoalla/qrpo-paper-llama-sft-leetcode-sandbox-temp1-ref50-offline-sandbox
skandermoalla/qrpo-paper-mistral-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 62.1k • 26
skandermoalla/qrpo-paper-mistral-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 62.1k • 34
skandermoalla/qrpo-paper-mistral-sft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 62.1k • 42
skandermoalla/qrpo-paper-mistral-sft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 91.9k • 39
skandermoalla/qrpo-paper-mistral-sft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 91.9k • 56
skandermoalla/qrpo-paper-mistral-sft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 91.9k • 50
skandermoalla/qrpo-paper-mistral-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 62.5k • 57
skandermoalla/qrpo-paper-mistral-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 62.5k • 23
skandermoalla/qrpo-paper-mistral-nosft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 62.5k • 24
skandermoalla/qrpo-paper-mistral-nosft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 99k • 31
skandermoalla/qrpo-paper-mistral-nosft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 99k • 57
skandermoalla/qrpo-paper-mistral-nosft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 99.1k • 42
skandermoalla/qrpo-paper-llama-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 62k • 9
skandermoalla/qrpo-paper-llama-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 62k • 34
skandermoalla/qrpo-paper-llama-sft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 62k • 30
skandermoalla/qrpo-paper-llama-sft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 100k • 26
skandermoalla/qrpo-paper-llama-sft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 100k • 13
skandermoalla/qrpo-paper-llama-sft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 100k • 18
skandermoalla/qrpo-paper-llama-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 61.6k • 13
skandermoalla/qrpo-paper-llama-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 61.6k • 84
skandermoalla/qrpo-paper-llama-nosft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 61.6k • 35
skandermoalla/qrpo-paper-llama-nosft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 93.8k • 42
skandermoalla/qrpo-paper-llama-nosft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 93.8k • 65
skandermoalla/qrpo-paper-llama-nosft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 96.6k • 32
skandermoalla/qrpo-paper-llama-nosft-leetcode-sandbox-temp1-ref50-offpolicy10random-sandbox
Viewer
• Updated • 26.6k • 23
skandermoalla/qrpo-paper-llama-nosft-leetcode-sandbox-temp1-ref50-offline-sandbox