cv
Basics
| Name | Sovit Nayak |
| Label | Data Scientist |
| sovit.nayak03@gmail.com | |
| Phone | (716) 295-3415 |
| Url | https://sovit-nayak.github.io/ |
| Summary | A Data Scientist with Research interests in Knowledge Distillation In LLMs |
Work
-
2024.08 - present Data Engineer / Machine Learning Engineer
Vigil AI
- Architected a high-performance data warehouse on AWS Redshift with columnar storage and automatic encoding, reducing reporting time by 40% and optimizing cost-efficiency.
- Engineered 15+ scalable ETL pipelines using AWS Glue and Informatica, reducing data integration time by 50% and processing 2TB+ of data weekly for enterprise reporting.
- Designed and implemented data models using Redshift and Snowflake, boosting query performance by 40% and reducing data retrieval times from 60 seconds to under 30 seconds
- Utilized HiveQL and SparkSQL for large-scale data querying, cutting down processing time by 35% across 1B+ records.
- Built a distributed streaming data pipeline with Apache Kafka and Spark, increasing data processing speed by 60%, handling over 10M events per day.
- Optimized Video Llama 2 using transfer learning techniques with PyTorch and CUDA across 100K annotated samples, enhancing content moderation accuracy to 92% across 1M+ videos, reducing false positives by 40%.
- Developed and fine-tuned Convolutional Neural Networks (CNNs) for real-time object detection, leveraging PyTorch and OpenCV, boosting AR tools like facial recognition and gesture interaction by 25%.
-
2021.09 - 2023.07 Data Analytics Engineer
Career Labs
- Conducted in-depth analysis of finance & marketing data from 200K users using SQL (CTEs, window functions, indexing), driving a 60% increase in user engagement and reducing drop-off by 30%.
- Orchestrated the development of 20+ scalable ETL pipelines using Python, SQL, and Apache Airflow, improving data integration efficiency by 50% and supporting weekly ingestion of 1.5B records
- Designed data models for financial analytics using Star Schema, decreasing query execution times by 30%, from 45 seconds to 30 seconds, and supporting 15+ interactive dashboards.
- Developed Tableau and QuickSight dashboards for 25+ KPIs, increasing data visibility and enabling data-driven decisions, leading to a 20% increase in stakeholder engagement.
- Deployed AWS EMR and Hadoop for distributed data processing, reducing analysis time by 40% for datasets exceeding 500GB.
- Leveraged AWS (S3, Redshift, Glue) to automate data integration across 10+ data sources, reducing data retrieval time by 40% and increasing reporting efficiency, supporting 50+ daily business intelligence queries.
- Integrated NoSQL solutions (DynamoDB, MongoDB) for data storage and retrieval, improving data processing efficiency by 25% across 500 million records
Volunteer
-
2018.04 - 2022.07 Chennai, India
Assistant Team Leader
National Service Scheme
Lead organizer for the New York City branch of the People's Climate March, the largest climate march in history.
- Awarded 'Climate Hero' award by Greenpeace for my efforts organizing the march.
- Men of the year 2014 by Time magazine
Education
-
2023.12 - 2024.01 Seoul, South Korea
-
2023.08 - 2024.12 Newark, NJ, USA
MS
New Jersey Institute Of Technology
Data Science and Statistics
- Machine Learning
- Applied Statistics
- Staticstical Computing with R
- Advanced Calculus
- Bayesian Statistics
- Rwgression Analysis
Certificates
| AWS Data Engineer - Associate | ||
| Amazon Web Services (AWS) | 2024-11-24 |
Publications
-
2021.09.24 Analysis of Emotion using Machine Learning on Social Media Platform
IJCA
In today’s fast-paced world, many turn to social media to express emotions, often unaware of their mental health struggles. Using machine learning, this approach analyzes social media posts to detect emotions through polarity (sentiment strength) and subjectivity (personal views). By identifying positive or negative sentiments, it aims to diagnose mental health conditions early and enable timely interventions.
Skills
| Data Science | |
| Pytorch | |
| Tensorflow | |
| AWS | |
| Redshift | |
| SQL | |
| Docker | |
| Git | |
| Data Modelling | |
| ETL Pipeline | |
| Apache Spark | |
| Apache Kafka | |
| Apache Airflow | |
| Snowflake | |
| Tableau | |
| PowerBI | |
| Looker |
Languages
| English | |
| Native speaker |
| Korean | |
| Fluent |
References
| Dr. Zuofeng Shang | |
| MATH 660 Intro to Stat Computing with R |