Role Responsibilities
Design and develop multi-agent benchmark tasks involving planning, scheduling, and resource allocationCreate real-world operational scenarios (logistics, project management, incident response, capacity planning)
Build constraint-rich problem statements with multiple dependencies and variables
Develop Python-based scripts to evaluate feasibility, completeness, and optimality
Break down complex problems into structured sub-tasks for multi-agent systems
Model scenarios with timelines, dependencies, and resource constraints
Collaborate with teams to improve task quality, coverage, and evaluation rigor
Requirements
5+ years of experience in operations, project management, logistics, or supply chain
Strong understanding of constraints, dependencies, and scheduling logic
Proficiency in Python for validation and verification scripting
Strong structured problem-solving and decomposition skills
Ability to model real-world operational scenarios
Clear technical communication and documentation skills
Ability to work in a fast-paced environment and meet deadlines
Preferred Qualifications
Experience with optimization techniques (linear programming, constraint satisfaction, scheduling algorithms)
Background in operations research
Experience with simulation or modeling tools
Familiarity with AI planning systems or automated reasoning
Experience with AI benchmarks (e.g., SWE-bench, Terminal-bench)
Hands-on experience with Docker
Application Process
Apply via Easy Apply / shared link and complete the Interest Check Form (ICF)
Complete the take-home assessment (post-shortlisting)
Shortlisted candidates will be reviewed further
The team will connect with next steps
Compensation: $15/hour
| Salary | Competitive |
| Type | Full-time |
| Location | — |
| Category | Operations |
| Posted | Apr 27, 2026 |