LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing
Under review
Benchmarking creative-writing evaluation with a diverse dataset and rigorous reliability metrics.
Hello, I'm
Researcher at Stanford University
I'm a computer science researcher at Stanford University, originally from Miami. My research focuses on human relationships to AI, particularly in the areas of preference learning and creativity.
I'm passionate about building systems that are performant while being legible to, and good for, humans.
Under review
Benchmarking creative-writing evaluation with a diverse dataset and rigorous reliability metrics.
NeurIPS ReliableML Workshop 2025
Uses influence functions to identify and trim low-value preference pairs for more efficient alignment.
NeurIPS Datasets and Benchmarks 2021
Cross-domain benchmark testing how well self-supervised methods transfer across diverse modalities.
Feel free to reach out for collaborations or just to say hello.