No et perdis res!
Uneix-te a la comunitat de wijobs i rep per email les millors ofertes d'ocupació
Mai no compartirem el teu email amb ningú i no t'enviarem correu brossa
Subscriu-te araInformàtica i IT
868Comercial i Vendes
820Transport i Logística
627Administració i Secretariat
572Desenvolupament de Programari
409Veure més categories
Comerç i Venda al Detall
356Màrqueting i Negoci
346Educació i Formació
331Dret i Legal
322Enginyeria i Mecànica
225Instal·lació i Manteniment
174Disseny i Usabilitat
148Indústria Manufacturera
129Sanitat i Salut
123Publicitat i Comunicació
122Construcció
114Recursos Humans
88Hostaleria
76Art, Moda i Disseny
67Comptabilitat i Finances
63Atenció al client
60Turisme i Entreteniment
54Arts i Oficis
50Immobiliària
47Producte
44Cures i Serveis Personals
31Alimentació
30Farmacèutica
16Banca
14Energia i Mineria
14Seguretat
13Social i Voluntariat
7Editorial i Mitjans
3Esport i Entrenament
3Assegurances
1Ciència i Investigació
1Telecomunicacions
1Agricultura
0Mindrift
Evaluation Scenario Writer - AI Agent Testing Specialist
Mindrift · Barcelona, ES
Teletreball . Python Docker Git
Please submit your CV in English and indicate your level of English proficiency.
Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment.
What This Opportunity Involves
While each project involves unique tasks, contributors may:
- Create structured test cases that simulate complex human workflows
- Define gold-standard behavior and scoring logic to evaluate agent actions
- Analyze agent logs, failure modes, and decision paths
- Work with code repositories and test frameworks to validate your scenarios
- Iterate on prompts, instructions, and test cases to improve clarity and difficulty
- Ensure that scenarios are production-ready, easy to run, and reusable
This opportunity is a good fit for software engineers, open to part-time, non-permanent projects. Ideally, contributors will have:
- 3+ of software development experience with strong Python focus
- Experience with Git and code repositories
- Comfortable with structured formats like JSON/YAML for scenario description
- Understanding core LLM limitations (hallucinations, bias, context limits) and how these affect evaluation design
- Familiarity with Docker
- English proficiency - B2
Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid
Project time expectations
Tasks for this project are estimated to take 6-10 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted.
Payment
- Paid contributions, with rates up to $30/hour*
- Fixed project rate or individual rates, depending on the project
- Some projects include incentive payments
- Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project