namehasan can biyik
rolecomputational linguist, nlp engineer
statusseeking · nlp / ml / ai engineering
locationunited states

hasan_can
biyik_

I build systems that understand language — from cross-lingual euphemism detection to production RAG pipelines. MS in Computational Linguistics from Montclair State, AI/ML specialization. Published at ACL and EACL. Interested in how meaning fails to transfer: euphemism, sarcasm, framing — the places language models tend to miss.

view research get in touch

§ 01 · about

background // a linguist who ships.

My research sits at the intersection of NLP and cross-lingual transfer — specifically, how meaning shifts across languages in subtle, culturally-loaded ways. Euphemism, sarcasm, framing: the things language models tend to miss.

Published work covers euphemism detection in Turkish and English, sarcasm in sitcom dialogue, and cross-lingual transfer with transformer models. I've presented at ACL-affiliated venues and managed multilingual annotation pipelines end-to-end.

Outside research, I build: production RAG systems, AWS-deployed models, full-stack NLP tools from training script to web UI. Currently targeting industry roles in AI, NLP, LLM fine-tuning, and applied research — while continuing to publish.

core
python pytorch huggingface nlp / nlu transformers fine-tuning
ml & data
scikit-learn pandas sql rag systems vector dbs aws
tools & infra
fastapi git latex linux / hpc

§ 02 · research

publications // peer-reviewed & presented.

2026 eacl 2026
rabat, morocco
When Semantic Overlap Is Not Enough: Cross-Lingual Euphemism Transfer Between Turkish and English
Biyik, H., Barak, L., Peng, J., Feldman, A.
published ↗
2024 sigturk @ acl 2024
bangkok
Turkish Delights: A Dataset on Turkish Euphemisms
Biyik, H., Lee, P., Feldman, A.
published ↗
2024 student research
montclair state univ.
Analysis of the Tone and Framing of the LGBTQ+ Community in Turkish Media
Biyik, H.
presentation ↗
more publications forthcoming

§ 03 · projects

selected work // code, demos, experiments.

/01
Multilingual euphemism detection fine-tuned on XLM-RoBERTa. Classifies euphemistic vs. literal usage across seven languages (TR/EN/ES/ZH/YO/UK/PL), with support for Ukrainian and Polish, served through a FastAPI backend on HuggingFace Spaces.
/02
Sarcasm detection in sitcom dialogue using fine-tuned transformers and zero-shot LLMs. Reached 85% F1 with Twitter-RoBERTa, and surfaced the "Chandler Effect" — models learning character-identity shortcuts instead of linguistic sarcasm features.
/03
LLM-powered research system that retrieves and synthesizes information from the web and Reddit using LangGraph orchestration, ChromaDB semantic retrieval, and GPT-4 reasoning.
/04
BERTurk fine-tuned on 235K Turkish product reviews for binary sentiment classification. 94.2% accuracy with real-time inference and interactive Plotly visualizations.
/05
AI-powered ATS resume analyzer that scores fit against job descriptions, identifies gaps, and suggests improvements using semantic similarity and LLM reasoning.
/06
CNN / LSTM / MLP models for prosodic prominence and boundary detection in speech. 85% F1 on prominence; a 117% improvement over classical ML baselines.
/07
Content-moderation classifier for identifying NSFW imagery using fine-tuned transformer models. Packaged as a Chrome extension ready for deployment.

§ 04 · contact

get in touch // channels open.

Actively looking for roles in NLP, ML, and AI engineering. If you're working on something interesting with language models or multilingual data, I'd love to hear from you.