cv

Basics

Name Sovit Nayak
Label Data Scientist
Email sovit.nayak03@gmail.com
Phone (716) 295-3415
Url https://sovit-nayak.github.io/
Summary A Data Scientist with Research interests in Knowledge Distillation In LLMs

Work

  • 2024.08 - present
    Data Engineer / Machine Learning Engineer
    Vigil AI
    • Architected a high-performance data warehouse on AWS Redshift with columnar storage and automatic encoding, reducing reporting time by 40% and optimizing cost-efficiency.
    • Engineered 15+ scalable ETL pipelines using AWS Glue and Informatica, reducing data integration time by 50% and processing 2TB+ of data weekly for enterprise reporting.
    • Designed and implemented data models using Redshift and Snowflake, boosting query performance by 40% and reducing data retrieval times from 60 seconds to under 30 seconds
    • Utilized HiveQL and SparkSQL for large-scale data querying, cutting down processing time by 35% across 1B+ records.
    • Built a distributed streaming data pipeline with Apache Kafka and Spark, increasing data processing speed by 60%, handling over 10M events per day.
    • Optimized Video Llama 2 using transfer learning techniques with PyTorch and CUDA across 100K annotated samples, enhancing content moderation accuracy to 92% across 1M+ videos, reducing false positives by 40%.
    • Developed and fine-tuned Convolutional Neural Networks (CNNs) for real-time object detection, leveraging PyTorch and OpenCV, boosting AR tools like facial recognition and gesture interaction by 25%.
  • 2021.09 - 2023.07
    Data Analytics Engineer
    Career Labs
    • Conducted in-depth analysis of finance & marketing data from 200K users using SQL (CTEs, window functions, indexing), driving a 60% increase in user engagement and reducing drop-off by 30%.
    • Orchestrated the development of 20+ scalable ETL pipelines using Python, SQL, and Apache Airflow, improving data integration efficiency by 50% and supporting weekly ingestion of 1.5B records
    • Designed data models for financial analytics using Star Schema, decreasing query execution times by 30%, from 45 seconds to 30 seconds, and supporting 15+ interactive dashboards.
    • Developed Tableau and QuickSight dashboards for 25+ KPIs, increasing data visibility and enabling data-driven decisions, leading to a 20% increase in stakeholder engagement.
    • Deployed AWS EMR and Hadoop for distributed data processing, reducing analysis time by 40% for datasets exceeding 500GB.
    • Leveraged AWS (S3, Redshift, Glue) to automate data integration across 10+ data sources, reducing data retrieval time by 40% and increasing reporting efficiency, supporting 50+ daily business intelligence queries.
    • Integrated NoSQL solutions (DynamoDB, MongoDB) for data storage and retrieval, improving data processing efficiency by 25% across 500 million records

Volunteer

  • 2018.04 - 2022.07

    Chennai, India

    Assistant Team Leader
    National Service Scheme
    Lead organizer for the New York City branch of the People's Climate March, the largest climate march in history.
    • Awarded 'Climate Hero' award by Greenpeace for my efforts organizing the march.
    • Men of the year 2014 by Time magazine

Education

  • 2023.12 - 2024.01

    Seoul, South Korea

    Exchange Program
    Yonsei University
    Big Data Analytics
    • Big Data
    • Hadoop
    • MapReduce
    • Hive
  • 2023.08 - 2024.12

    Newark, NJ, USA

    MS
    New Jersey Institute Of Technology
    Data Science and Statistics
    • Machine Learning
    • Applied Statistics
    • Staticstical Computing with R
    • Advanced Calculus
    • Bayesian Statistics
    • Rwgression Analysis

Certificates

AWS Data Engineer - Associate
Amazon Web Services (AWS) 2024-11-24

Publications

  • 2021.09.24
    Analysis of Emotion using Machine Learning on Social Media Platform
    IJCA
    In today’s fast-paced world, many turn to social media to express emotions, often unaware of their mental health struggles. Using machine learning, this approach analyzes social media posts to detect emotions through polarity (sentiment strength) and subjectivity (personal views). By identifying positive or negative sentiments, it aims to diagnose mental health conditions early and enable timely interventions.

Skills

Data Science
Pytorch
Tensorflow
AWS
Redshift
SQL
Docker
Git
Data Modelling
ETL Pipeline
Apache Spark
Apache Kafka
Apache Airflow
Snowflake
Tableau
PowerBI
Looker

Languages

English
Native speaker
Korean
Fluent

References

Dr. Zuofeng Shang
MATH 660 Intro to Stat Computing with R