Research
Some research projects I’ve been lucky enough to contribute to:
Deepfake Detection
In 2024, I worked as a researcher at TrueMedia.org, first as part of my masters degree at UW and then continuing on as an intern. Our work focused on detecting fake media, and I specifically worked on detecting fake images – my masters thesis was Identifying Modern Deepfakes: Bringing Fake Image Detection into the Wild, advised by professor and founder of TrueMedia.org, Oren Etzioni. I benchmarked, modified, and improved numerous open source detection models to detect in-the-wild deepfakes; contributed to ideating novel detection methods; and assessed failure modes and limitations of open source and commercial deepfake detection methods. As a team, we supported a deepfake detection platform achieving over 90% accuracy across detecting image, video, and audio deepfakes, used by news organizations, social media companies, investigative reporters, and the general public. Following the closure of TrueMedia.org as an organization, our products have now been open-sourced.
Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024. Nuria Alina Chandra, Ryan Murtfeldt, Lin Qiu, Arnab Karmakar, Hannah Lee, Emmanuel Tanumihardja, Kevin Farhat, Ben Caffee, Sejin Paik, Changyeon Lee, Jongwook Choi, Aerin Kim, Oren Etzioni. Preprint, 2025.
The Tug-of-War Between Deepfake Generation and Detection.
Hannah Lee, Changyeon Lee, Kevin Farhat, Lin Qiu, Steve Geluso, Aerin Kim, Oren Etzioni.
Data-centric Machine Learning Workshop at the 41st International Conference on Machine Learning, Vienna, Austria, Jul. 2024.
Mobile Health Applications
In undergrad, I worked with the UbiComp Lab on developing mobile health applications. This culminated in my undergraduate thesis, Determining Input Image Quality for Smartphone Detection of Anemia, advised by Jason Hoffman and Prof. Shwetak Patel. Together, we focused on work surrounding using smartphones to detect anemia and to monitor central venous pressure, utilizing the smartphone camera.
Artificial intelligence-enabled non-invasive ubiquitous anemia screening: The HEMO-AI pilot study on pediatric population. Daniel Gordon, Jason Hoffman, Keren Gamrasni, Yotam Barlev, Alex Levine, Tamar Landau, Ronen Shpiegel, Avishai Lahad, Ariel Koren, Carina Levin, Osnat Naor, Hannah Lee, Xin Liu, Shwetak Patel, Gilad Chayen, Michael Brandwein. DIGITAL HEALTH, 2024.
LLMs and Multimodal Models + Data
I’ve also spent some time in the past looking into large models and datasets, including OpenFlamingo, LLaMA, and TinyLlama.
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens.
Anas Awadalla, Le Xue, Oscar Lo, Manli Shu, Hannah Lee, Etash Kumar Guha, Matt Jordan, Sheng Shen, Mohamed Awadalla, Silvio Savarese, Caiming Xiong, Ran Xu, Yejin Choi, Ludwig Schmidt.
NeurIPS 2024, Datasets and benchmarks track.