Open to full-time roles · New York, NY

Harshit

Soni_

Data Scientist from NYU. I build ML systems, run causal experiments, and turn messy data into evidence that moves decisions forward.

3.96/4.0
GPA at NYU
4+yrs
Industry Exp.
15+
Projects
2
Publications Accepted/Under Review
01 — About

The Story

I'm a Data Scientist with an MS from NYU (3.96 GPA) and 4+ years of industry experience. My coursework covers machine learning, deep learning, causal inference, big data, NLP, probabilistic modeling, and optimization.

At Northeastern's Gyori Lab, I lead a research project on ontology alignment using representational learning and semantic search, with a paper accepted at ACM-BCB 2026 and a Python package released under DARPA's MapNet framework. I also have a CVPR 2026 submission where I distilled NASA's 300M-parameter foundation model down to 25M and the student outperformed the teacher on change detection.

At NYU's Urban Politics Lab, I run causal studies on urban policy. One used regression discontinuity to show congestion pricing cut noise complaints by 70%, another is a first-of-its-kind quasi-experimental analysis of board-ups and crime across 6K+ properties.

As a Research Assistant at NYU, I worked on quantifying 45 years of sexual harassment coverage trends in the New York Times across 8,900 articles, building reproducible analysis and visualizations for a non-technical audience.

Before grad school, I spent 4 years at Sunshine Marketing Agencies where I set up the company's first analytics function, built ETL pipelines and forecasting systems, and scaled the team to 5 analysts. That work supported 5.25x revenue growth, reduced product returns by 33% across 700 SKUs, and I built a full-stack order management system in MERN/Firebase. At TCS before that, I automated testing and migration workflows via PowerShell, cutting delivery time by 35% and earning a Star Team Award.

On the project side, I've deployed a multi-agent SEC filing analyst with LangGraph and Docker Compose, a fraud detection pipeline as a FastAPI endpoint, and a recommendation system on 27M+ ratings in PySpark. I work across Python, R, SQL, PySpark, PyTorch, Docker, and GCP.

New York University
M.S. Data Science
3.96 / 4.0
Expected May 2026·New York, NY
SRM Institute of Science & Technology
B.Tech Computer Science
8.80 / 10
May 2019·India
02 — Experience

Where I've Worked

Data Scientist

The Urban Politics Lab, NYU
Feb 2026 – Present
  • Proved NYC congestion pricing reduced noise complaints by 70% using a regression discontinuity design in R, contributing to a major transit policy grant.
  • Executing a first-of-its-kind quasi-experimental study on 6K+ properties and 290K+ incidents to evaluate the causal impact of residential board-ups on crime.
  • Discovered delayed-effect patterns in urban crime data, challenging traditional narratives around property maintenance and public safety.

Machine Learning Researcher

Gyori Lab, Northeastern University
Sep 2025 – Present
  • Developed an automated ontology alignment system using LLMs fine-tuned on knowledge graphs via contrastive learning, achieving 98% precision.
  • Optimized high-recall retrieval at scale using FAISS and representational learning to map concepts across disparate biomedical databases.
  • Released a plug-and-play Python library under DARPA’s MapNet framework; research paper accepted at ACM-BCB 2026.

Grader

New York University
Sep 2025 – Dec 2025
  • Evaluated coursework for graduate-level Optimization and Computational Linear Algebra within the Center for Data Science.

Data Science Research Assistant

New York University
Mar 2025 – Apr 2026
  • Quantified 45 years of sexual harassment coverage trends across 9K NYT articles using hypothesis testing and correlation analysis across 34 categorical features.
  • Built reproducible Python pipelines and visualizations in Seaborn/Matplotlib to translate complex longitudinal data for non-technical collaborators.
  • Maintained documented research notebooks for a co-authored study on the social evolution of media framing and story prominence.

Business & Data Analyst

Sunshine Marketing Agencies
Jul 2020 – Aug 2024
  • Scaled the company’s first analytics function from 0 to 5 analysts, supporting a 5.25x revenue growth ($500K to $2.5M) through data-driven inventory allocation.
  • Reduced product returns by 33% (saving ~$50K/year) by analyzing SKU velocity and perishability patterns for 700+ products.
  • Architected a full-stack MERN/Firebase order management system handling 100+ daily orders, cutting processing time by 50%.
  • Automated client communications via Gmail/WhatsApp/Sheets integrations, reducing inbound support volume by 80% within three months.

Systems Engineer

Tata Consultancy Services
Jun 2019 – Jul 2020
  • Accelerated project delivery by 35% by automating manual testing and SharePoint migration workflows using PowerShell scripts.
  • Developed dynamic Power BI dashboards and PowerApps workflows for enterprise clients to track real-time operational KPIs.
  • Received the Star Team Award for completing high-priority migrations and automation deployments within two quarters.
03 — Projects

What I've Built

All on GitHub
04 — Skills

Toolkit

05 — Writing

Thoughts

All Posts
06 — Contact

Let's Talk

Open to full-time Data Science / ML Engineering roles in New York. Also happy to chat about research, causal inference, or interesting data problems.