cv | Sovit Nayak

Basics

Name	Sovit Nayak
Label	Data Scientist
Email	sovit.nayak03@gmail.com
Phone	(716) 295-3415
Url	https://sovit-nayak.github.io/
Summary	A Data Scientist with Research interests in Knowledge Distillation In LLMs

Work

2024.08 - present
Data Engineer / Machine Learning Engineer

Vigil AI
- Architected a high-performance data warehouse on AWS Redshift with columnar storage and automatic encoding, reducing reporting time by 40% and optimizing cost-efficiency.
- Engineered 15+ scalable ETL pipelines using AWS Glue and Informatica, reducing data integration time by 50% and processing 2TB+ of data weekly for enterprise reporting.
- Designed and implemented data models using Redshift and Snowflake, boosting query performance by 40% and reducing data retrieval times from 60 seconds to under 30 seconds
- Utilized HiveQL and SparkSQL for large-scale data querying, cutting down processing time by 35% across 1B+ records.
- Built a distributed streaming data pipeline with Apache Kafka and Spark, increasing data processing speed by 60%, handling over 10M events per day.
- Optimized Video Llama 2 using transfer learning techniques with PyTorch and CUDA across 100K annotated samples, enhancing content moderation accuracy to 92% across 1M+ videos, reducing false positives by 40%.
- Developed and fine-tuned Convolutional Neural Networks (CNNs) for real-time object detection, leveraging PyTorch and OpenCV, boosting AR tools like facial recognition and gesture interaction by 25%.
2021.09 - 2023.07
Data Analytics Engineer

Career Labs
- Conducted in-depth analysis of finance & marketing data from 200K users using SQL (CTEs, window functions, indexing), driving a 60% increase in user engagement and reducing drop-off by 30%.
- Orchestrated the development of 20+ scalable ETL pipelines using Python, SQL, and Apache Airflow, improving data integration efficiency by 50% and supporting weekly ingestion of 1.5B records
- Designed data models for financial analytics using Star Schema, decreasing query execution times by 30%, from 45 seconds to 30 seconds, and supporting 15+ interactive dashboards.
- Developed Tableau and QuickSight dashboards for 25+ KPIs, increasing data visibility and enabling data-driven decisions, leading to a 20% increase in stakeholder engagement.
- Deployed AWS EMR and Hadoop for distributed data processing, reducing analysis time by 40% for datasets exceeding 500GB.
- Leveraged AWS (S3, Redshift, Glue) to automate data integration across 10+ data sources, reducing data retrieval time by 40% and increasing reporting efficiency, supporting 50+ daily business intelligence queries.
- Integrated NoSQL solutions (DynamoDB, MongoDB) for data storage and retrieval, improving data processing efficiency by 25% across 500 million records

Volunteer

2018.04 - 2022.07

Chennai, India
Assistant Team Leader

National Service Scheme

Lead organizer for the New York City branch of the People's Climate March, the largest climate march in history.
- Awarded 'Climate Hero' award by Greenpeace for my efforts organizing the march.
- Men of the year 2014 by Time magazine

Education

2023.12 - 2024.01

Seoul, South Korea
Exchange Program

Yonsei University

Big Data Analytics
- Big Data
- Hadoop
- MapReduce
- Hive
2023.08 - 2024.12

Newark, NJ, USA
MS

New Jersey Institute Of Technology

Data Science and Statistics
- Machine Learning
- Applied Statistics
- Staticstical Computing with R
- Advanced Calculus
- Bayesian Statistics
- Rwgression Analysis

Certificates

	AWS Data Engineer - Associate
	Amazon Web Services (AWS)	2024-11-24

Publications

2021.09.24

Analysis of Emotion using Machine Learning on Social Media Platform

IJCA

In today’s fast-paced world, many turn to social media to express emotions, often unaware of their mental health struggles. Using machine learning, this approach analyzes social media posts to detect emotions through polarity (sentiment strength) and subjectivity (personal views). By identifying positive or negative sentiments, it aims to diagnose mental health conditions early and enable timely interventions.

Skills

	Data Science
	Pytorch
	Tensorflow
	AWS
	Redshift
	SQL
	Docker
	Git
	Data Modelling
	ETL Pipeline
	Apache Spark
	Apache Kafka
	Apache Airflow
	Snowflake
	Tableau
	PowerBI
	Looker

Languages

	English
	Native speaker

	Korean
	Fluent

References

	Dr. Zuofeng Shang
	MATH 660 Intro to Stat Computing with R

Basics

Work

Data Engineer / Machine Learning Engineer

Vigil AI

Data Analytics Engineer

Career Labs

Volunteer

Assistant Team Leader

National Service Scheme

Lead organizer for the New York City branch of the People's Climate March, the largest climate march in history.

Education

Exchange Program

Yonsei University

Big Data Analytics

MS

New Jersey Institute Of Technology

Data Science and Statistics

Certificates

Publications

Analysis of Emotion using Machine Learning on Social Media Platform

IJCA

Skills

Languages

References