Harshit
Soni_
Data Scientist from NYU. I build ML systems, run causal experiments, and turn messy data into evidence that moves decisions forward.
The Story
I'm a Data Scientist with an MS from NYU (3.96 GPA) and 4+ years of industry experience. My coursework covers machine learning, deep learning, causal inference, big data, NLP, probabilistic modeling, and optimization.
At Northeastern's Gyori Lab, I lead a research project on ontology alignment using representational learning and semantic search, with a paper accepted at ACM-BCB 2026 and a Python package released under DARPA's MapNet framework. I also have a CVPR 2026 submission where I distilled NASA's 300M-parameter foundation model down to 25M and the student outperformed the teacher on change detection.
At NYU's Urban Politics Lab, I run causal studies on urban policy. One used regression discontinuity to show congestion pricing cut noise complaints by 70%, another is a first-of-its-kind quasi-experimental analysis of board-ups and crime across 6K+ properties.
As a Research Assistant at NYU, I worked on quantifying 45 years of sexual harassment coverage trends in the New York Times across 8,900 articles, building reproducible analysis and visualizations for a non-technical audience.
Before grad school, I spent 4 years at Sunshine Marketing Agencies where I set up the company's first analytics function, built ETL pipelines and forecasting systems, and scaled the team to 5 analysts. That work supported 5.25x revenue growth, reduced product returns by 33% across 700 SKUs, and I built a full-stack order management system in MERN/Firebase. At TCS before that, I automated testing and migration workflows via PowerShell, cutting delivery time by 35% and earning a Star Team Award.
On the project side, I've deployed a multi-agent SEC filing analyst with LangGraph and Docker Compose, a fraud detection pipeline as a FastAPI endpoint, and a recommendation system on 27M+ ratings in PySpark. I work across Python, R, SQL, PySpark, PyTorch, Docker, and GCP.
Where I've Worked
Data Scientist
- →Proved NYC congestion pricing reduced noise complaints by 70% using a regression discontinuity design in R, contributing to a major transit policy grant.
- →Executing a first-of-its-kind quasi-experimental study on 6K+ properties and 290K+ incidents to evaluate the causal impact of residential board-ups on crime.
- →Discovered delayed-effect patterns in urban crime data, challenging traditional narratives around property maintenance and public safety.
Machine Learning Researcher
- →Developed an automated ontology alignment system using LLMs fine-tuned on knowledge graphs via contrastive learning, achieving 98% precision.
- →Optimized high-recall retrieval at scale using FAISS and representational learning to map concepts across disparate biomedical databases.
- →Released a plug-and-play Python library under DARPA’s MapNet framework; research paper accepted at ACM-BCB 2026.
Grader
- →Evaluated coursework for graduate-level Optimization and Computational Linear Algebra within the Center for Data Science.
Data Science Research Assistant
- →Quantified 45 years of sexual harassment coverage trends across 9K NYT articles using hypothesis testing and correlation analysis across 34 categorical features.
- →Built reproducible Python pipelines and visualizations in Seaborn/Matplotlib to translate complex longitudinal data for non-technical collaborators.
- →Maintained documented research notebooks for a co-authored study on the social evolution of media framing and story prominence.
Business & Data Analyst
- →Scaled the company’s first analytics function from 0 to 5 analysts, supporting a 5.25x revenue growth ($500K to $2.5M) through data-driven inventory allocation.
- →Reduced product returns by 33% (saving ~$50K/year) by analyzing SKU velocity and perishability patterns for 700+ products.
- →Architected a full-stack MERN/Firebase order management system handling 100+ daily orders, cutting processing time by 50%.
- →Automated client communications via Gmail/WhatsApp/Sheets integrations, reducing inbound support volume by 80% within three months.
Systems Engineer
- →Accelerated project delivery by 35% by automating manual testing and SharePoint migration workflows using PowerShell scripts.
- →Developed dynamic Power BI dashboards and PowerApps workflows for enterprise clients to track real-time operational KPIs.
- →Received the Star Team Award for completing high-priority migrations and automation deployments within two quarters.
What I've Built
Toolkit
Thoughts
Let's Talk
Open to full-time Data Science / ML Engineering roles in New York. Also happy to chat about research, causal inference, or interesting data problems.