Data Analyst & Engineer

Harsh
Koli

Data-driven problem solver with hands-on expertise in building scalable data pipelines, statistical analysis, and translating complex data into actionable insights.

1+
Years Experience
10+
Projects
5+
Tech Skills
Python
Expert
SQL
Advanced
Data
Focused

About Me

AI & Data Engineering Professional

I'm a Data Analyst with hands-on experience in data collection, cleaning, statistical analysis, and building scalable data solutions. Currently pursuing an MSc in Data Science from the University of Essex.

Passionate about leveraging Python, SQL, and data engineering practices to build robust pipelines that process large datasets efficiently and extract meaningful insights.

I excel in cross-functional collaboration, translating complex data findings into clear, actionable insights for both technical and non-technical stakeholders.

Education

Postgraduate Diploma in Data Science
University of Essex | 2025-2026
Machine Learning, Statistical Modelling, Predictive Analytics, Data Engineering
BE in Data Engineering
University of Mumbai | 2021-2024
Grade: 2:1

Work Experience

AI & Data Engineering Extern

Extern, Inc. (Outamation Project)

Dec 2025 – Present | Remote
  • Collected and processed data from multiple unstructured sources, building Python-based pipelines for structured data extraction
  • Applied rigorous data quality validation and transformation techniques ensuring accuracy, completeness, and integrity
  • Collaborated cross-functionally with engineering and analytics teams to deliver scalable automation solutions
  • Ensured PII-aware data handling and governance compliance in production-adjacent environment

Data Analyst Intern

Trainity

Sep 2023 – Dec 2023 | Remote
  • Collected, organized, and analyzed large structured datasets using Python, SQL, and Excel
  • Performed systematic data cleaning, validation, and preprocessing for downstream reporting
  • Translated complex analytical findings into clear, actionable business strategies and insights
  • Independently managed end-to-end analysis tasks demonstrating strong ownership and quality delivery

Featured Projects

Credit Risk & Fraud Detection

2024

Predictive Modelling for Financial Risk Analysis

Built an ML model for credit risk assessment using scikit-learn and XGBoost. Engineered features from transactional data, handled class imbalance with SMOTE, and delivered business-readable risk reports.

Tech Stack

Python scikit-learn XGBoost SQL

Key Skills: Statistical Analysis, Feature Engineering, Class Imbalance Handling

Customer Purchase Prediction

2023

Customer Segmentation & Recommendation System

Analyzed transactional customer datasets to identify purchasing behavior patterns. Built classification models for customer segmentation and created actionable pricing/targeting recommendations.

Tech Stack

Python pandas Supervised Learning Excel

Key Skills: Exploratory Data Analysis, Customer Segmentation, Behavioral Analytics

Data Extraction & Cleaning Pipelines

Ongoing

Building Scalable ETL Solutions

Developed Python-based ETL pipelines for processing structured and unstructured data. Implemented data validation frameworks, automated data quality checks, and created dashboards for monitoring data integrity.

Tech Stack

Python SQL PostgreSQL Data Validation

Key Skills: Data Pipelines, Data Quality, ETL, Database Management

Executive Analytics Dashboard

2024

Tata Group Data Visualization Job Simulation

Created executive-level dashboards and data stories to communicate analytical findings. Translated complex data into strategic insights for senior stakeholders, emphasizing clarity and actionability.

Tech Stack

Power BI Excel Data Visualization SQL

Key Skills: Business Intelligence, Stakeholder Communication, Strategic Reporting

Technical Skills

Languages & Tools

Python Expert
SQL Advanced
Excel Advanced
R Intermediate

Data Analysis & Engineering

  • Data Cleaning & Preprocessing
  • Statistical Analysis & A/B Testing
  • Feature Engineering
  • ETL & Data Pipelines
  • API Data Collection

Libraries & Frameworks

pandas numpy scikit-learn TensorFlow XGBoost Matplotlib

Databases & Tools

  • PostgreSQL, PL/SQL
  • Power BI & Dashboarding
  • Git & Version Control
  • Docker Basics

Soft Skills

  • Stakeholder Communication
  • Cross-functional Collaboration
  • Problem-Solving
  • Attention to Detail

Specializations

  • Data Governance & PII Handling
  • Responsible AI Practices
  • Exploratory Data Analysis
  • Predictive Modelling

Get In Touch

Always interested in discussing data, opportunities, and collaborative projects. Let's connect!

📍 Based in Mumbai, India | 📱 +91 9518951072

Open to remote opportunities and collaborative projects worldwide