![](/static/imgs/arrowLeftRA.webp)
Request Access For A Free Trial
Processing...
![](/static/imgs/hp_chat_llm-ent_s.webp)
![](/static/imgs/hp_forecast_s.webp)
![](/static/imgs/hp_marketing_s.webp)
![](/static/imgs/hp_anomalydetection_s.webp)
![](/static/imgs/hp_foundation_models_s.webp)
![](/static/imgs/hp_nlp_s.webp)
![](/static/imgs/hp_fraud_s.webp)
![](/static/imgs/hp_ai_agents_s.webp)
![](/static/imgs/hp_code_llm_s.webp)
![](/static/imgs/hp_structured_ml_s.webp)
![](/static/imgs/hp_vision_s.webp)
![](/static/imgs/hp_discrete_optimization_s.webp)
![](/static/imgs/hp_user_eng_s.webp)
Processing...
Thanks for requesting access to Abacus.AI, we will get back to you shortly
Model | Reasoning Average | Coding Average |
---|---|---|
claude-3-5-sonnet-20241022 | 58.67 | 67.13 |
claude-3-5-sonnet-20240620 | 58.67 | 60.85 |
qwen2.5-coder-32b-instruct | 47.33 | 56.85 |
dracarys2-72b-instruct | 42.67 | 56.64 |
qwen2.5-72b-instruct | 46.00 | 56.56 |
gemini-exp-1114 | 54.67 | 52.36 |
gpt-4o-2024-08-06 | 54.67 | 51.44 |
GPT - 3.5 (PROP) | GEMINI PRO (PROP) | MISTRAL - SMALL (PROP) | MISTRAL - MEDIUM (PROP) | SMAUG - 72B (PROP) | |
MMLU | 70.0 | 71.8 | 70.6 | 75.3 | 77.15 |
HellaSwag | 85.5 | 84.7 | 86.7 | 88.9 | 89.27 |
Arc | 85.2 | unknown | 85.8 | 88.9 | 76.02 |
WinoGrade | 81.6 | unknown | 81.2 | 88 | 85.05 |
GSM-8K | 57.1 | unknown | 58.4 | 66.7 | 78.7 |
Truthful QA | unknown | unknown | unknown | unknown | 76.67 |