Evaluation Scenario Writer - AI Agent Testing Specialist

Mindrift · Madrid

Nivell d'experiència---
Tipus de contracteA temps parcial
Publicada7 de gen.

. Python QA Machine Learning Teletreball

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.

At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI.

What We Do

The Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.

About The Role

We're looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You'll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You'll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You'll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions.

Although every project is unique, you might typically:

Create structured test cases that simulate complex human workflows
Define gold-standard behavior and scoring logic to evaluate agent actions.
Analyze agent logs, failure modes, and decision paths
Work with code repositories and test frameworks to validate your scenarios
Iterate on prompts, instructions, and test cases to improve clarity and difficulty
Ensure that scenarios are production-ready, easy to run, and reusable

How To Get Started

Simply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you'll help shape the future of AI while ensuring technology benefits everyone.

Requirements

Bachelor's and/or Master's Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields.
Background in QA, software testing, data analysis, or NLP annotation
Good understanding of test design principles (e.g., reproducibility, coverage, edge cases)
Strong written communication skills in English
Comfortable with structured formats like JSON/YAML for scenario description
Can define expected agent behaviors (gold paths) and scoring logic
Basic experience with Python and JS
Curious and open to working with AI-generated content, agent logs, and prompt-based behavior

Nice to Have

Experience in writing manual or automated test cases
Familiarity with LLM capabilities and typical failure modes
Understanding of scoring metrics (precision, recall, coverage, reward functions)

Benefits

Contribute on your own schedule, from anywhere in the world. This opportunity allows you to:

Get paid for your expertise, with rates that can go up to $30/hour depending on your skills, experience, and project needs
Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments
Participate in an advanced AI project and gain valuable experience to enhance your portfolio
Influence how future AI models understand and communicate in your field of expertise

Veure més

No et perdis res!

Uneix-te a la comunitat de wijobs i rep per email les millors ofertes d'ocupació

Mai no compartirem el teu email amb ningú i no t'enviarem correu brossa

Subscriu-te ara

Darreres ofertes d'ocupació de Màrqueting i Negoci a Madrid

Site Manager

Nova

LOGIFRIO

Torrejón de Ardoz, ES

Responsabilidades/Descripción de funciones: · Se coordina con su responsable, para la elaboración de presupuestos...

Recruiting Coodinator (H/M/X)

Nova

Manpower España

Madrid, ES

¡Estamos buscando un(a) Coordinador(a) de Reclutamiento! ¿Te apasiona la organización, la comunicación y crear...

. Excel Office

Research Assistant On Medical 3D Printing And Planning Bioengineering (Ppcc-3Dsurghelp)

Nova

Institute for Bioengineering of Catalonia (IBEC)

Madrid, ES

This job is with IBEC, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+...

Business Analyst Latam

Nova

Securitas Direct, part of Verisure

Pozuelo de Alarcón, ES

Estamos en búsqueda de un Business Analyst que se unirá a nuestro creciente y exitoso equipo de soporte a la región de...

. TSQL Excel

Product Marketing Manager (Consumer Goods) - Global Business Solutions- Southern Europe (Madrid based)

Nova

TikTok

Madrid, ES

Responsibilities The Product Solutions and Operations (PSO) team's goal is to help businesses have seamless experiences...

Becario/a de Marketing de Contenidos & Operaciones

Nova

Remove Group

Madrid, ES

📄 Descripción del puesto Buscamos incorporar un/a Becario/a de Marketing de Contenidos & Operaciones para unirse a...

CMS

Accounting Officer (Ao_Ag)

Nova

Institute for Bioengineering of Catalonia (IBEC)

Madrid, ES

This job is with IBEC, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+...

. Excel

Beca de Marketing Perfumería Nicho

Nova

PyD

Madrid, ES

EL PODER DE UN COMPROMISO Quiénes somos: Empresa familiar fundada en España en 1998, líder en creación, desarrollo y...

. Java Excel

Reservations Specialist Madrid

Nova

CATERINA - The Pro Living

Madrid, ES

Caterina Property Management, compañía líder en el sector flex living con un equipo de 35 personas repartidas entre...

. Office Excel

Responsable de Marketing de Empresas

Nova

Job&Talent España

Boadilla del Monte, ES

Desde Jobandtalent estamos buscando profesionales para cubrir una posición de Responsable de Marketing de Empresas en una...

Veure més ofertes

Tipus	Nom	Finalitat	Durada
Sessió	ASP.NET_SessionId	Administra la sessió de l'usuari en el lloc web	Durant la sessió de l'usuari
Sessió	wj_uuid	Identifica l'usuari en diferents sessions	1 any
anti falsificació	.AspNetCore.Antiforgery.*	Proporciona protecció contra atacs de falsificació de sol·licituds entre llocs	Durant la sessió de l'usuari
Autentificació	.AspNetCore.Cookies	Emmagatzema dades encriptades de l'usuari que es requereixen per accedir o mostrar dades en el lloc	Durant la sessió de l'usuari
RGPD	.AspNet.Consent wj_con_pe wj_con_ad wj_con_an	Emmagatzemen informació relativa a les preferències de l'usuari sobre el Reglament General de Protecció de Dades o RGPD	1 any

Tipus	Nom	Finalitat	Durada
Idioma	.AspNetCore.Culture	Emmagatzema informació relativa a la teva preferència d'idioma	1 any
Cerca	wj_loc wj_search wj_tags wj_tags_loc	Emmagatzema informació per recordar les teves preferències de cerca	1 any
Favorits	wj_bookmarks wj_likes	Emmagatzema informació relativa al teu contingut favorit	1 any
Alertes per email	wj_e_sub	Indica si l'usuari esta o no subscrit a les alertes per email	1 any
Alertes per email	wj_e_sub_v	Indica si el usuario ha verificado o no su suscripción por email	1 any
Alertes per email	wj_e_sub_a	Indica si el usuario tiene o no activas las alertas por email	1 any
Alertes amb OneSignal	__cfduid	Pots conèixer com OneSignal fa servir la informació de llocs o aplicacions que usen els seus serveis visitant el seu lloc web	1 mes
Sessió	wj_tv	Indica si l'usuari és recurrent	1 any

Tipus	Nom	Finalitat i durada
Google Analytics	_ga _gat _gid AMP_TOKEN _gac_* _lc.visitor_id.*	Pots conèixer com Google utilitza la informació de llocs o aplicacions que usen els seus serveis visitant el seu lloc web
Hotjar	_hjClosedSurveyInvites _hjDonePolls _hjMinimizedPolls _hjShownFeedbackMessage _hjid _hjRecordingLastActivity _hjTLDTest _hjUserAttributesHash _hjLocalStorageTest _hjIncludedInPageviewSample _hjIncludedInSessionSample _hjAbsoluteSessionInProgress	Pots conèixer com Hotjar fa servir la informació de llocs o aplicacions que usen els seus serveis visitant el seu lloc web

Evaluation Scenario Writer - AI Agent Testing Specialist

Mindrift · Madrid

No et perdis res!

Darreres ofertes d'ocupació de Màrqueting i Negoci a Madrid

Site Manager

LOGIFRIO

Recruiting Coodinator (H/M/X)

Manpower España

Research Assistant On Medical 3D Printing And Planning Bioengineering (Ppcc-3Dsurghelp)

Institute for Bioengineering of Catalonia (IBEC)

Business Analyst Latam

Securitas Direct, part of Verisure

Product Marketing Manager (Consumer Goods) - Global Business Solutions- Southern Europe (Madrid based)

TikTok

Becario/a de Marketing de Contenidos & Operaciones

Remove Group

Accounting Officer (Ao_Ag)

Institute for Bioengineering of Catalonia (IBEC)

Beca de Marketing Perfumería Nicho

PyD

Reservations Specialist Madrid

CATERINA - The Pro Living

Responsable de Marketing de Empresas

Job&Talent España

No et perdis res!

Top Zones

Top Ocupacions