ASTRA: HackerRank's coding benchmark for LLMs

ASTRA: HackerRank's coding benchmark for LLMs
5 by rvivek | 1 comments on Hacker News.
We help companies hire & upskill developers. A customer recently asked: What % of HackerRank problems can LLMs solve? That got us thinking—how should hiring evolve when AI can translate natural language to code? Our belief: AI will handle much of code generation, so developers will be assessed more on SDLC skills with AI assistants. To explore this, we’re benchmarking LLMs on real-world software dev scenarios—starting with 65 unseen problems across 10 domains. Beyond correctness, we evaluated consistency—an often overlooked aspect of AI reliability. We’re open-sourcing the dataset on Huggingface and expanding it to cover more domains, ambiguous specs, and harder challenges. Would love the HN community’s take on this!

ASTRA: HackerRank's coding benchmark for LLMs

Post a Comment

0 Comments

Subscribe Us

Popular Posts

Retool (YC W17) is hiring hackers to combine visual programming with AI

Facebook

Categories

Search This Blog

Labels

Report Abuse

About Me

Random Posts

Recent in Fashion

Popular Posts

Retool (YC W17) is hiring hackers to combine visual programming with AI

Inheritance was invented as a performance hack

US judge finds Israel's NSO Group liable for hacking journalists via WhatsApp

Footer Menu Widget

ASTRA: HackerRank's coding benchmark for LLMs

You may like these posts

Post a Comment

0 Comments

Social Plugin

Subscribe Us

Popular Posts

Retool (YC W17) is hiring hackers to combine visual programming with AI

Facebook

Categories

Search This Blog

Labels

Report Abuse

About Me

Random Posts

Recent in Fashion

Popular Posts

Retool (YC W17) is hiring hackers to combine visual programming with AI

Inheritance was invented as a performance hack

US judge finds Israel's NSO Group liable for hacking journalists via WhatsApp

Footer Menu Widget