Hi.
I 'm Parag Chandiwal

I’m a data scientist who is ready to join the business world from academia

Journey

Why Data Science?

Quite a number of people have asked me about my switch from software development to data analytics. How did I do it? When did I do it? Why did I do it? I felt today (May 12th, 2018) was a befitting day to answer these questions(I am graduating!) I hope sharing my story would give some insights into what I did to become data analyst and encourage budding "anythings" everywhere to pursue their passion fiercely.

My first exposure to data was from a project that had nothing to do with data.

While working on a project for Hinsdale Psychiatry, I noticed how "Patient Healthcare Questionnaire" was used by practitioners to diagnose likelihood of common mental disorders. To put it lightly, I was mind blown and had to find out more. This discovery came at the tipping point for me because at the time I was in my Bachelor's final year. This realization also made me open new challenges and pivoting career-wise. Data Science seemed to fit right into that.

I created my first learning curve from an answer on Quora.

I found a very helpful answer which I recommend to this day for anyone to start out in data. How can I become Data Scientist? This answer shaped me my first learning path in January 2016. I enrolled in a Master's Program at Illinois Tech, Chicago in ITM Data Management. My graduate school study centered around the areas of Object-Oriented Programming, Advance data analytics & warehousing, automation and backend engineering.

How I started my blog — where the real learning started

By end of 2017, I had slowed down on online courses because 90% of the courses had the same content and assumed you’re a beginner so it became a bit repetitive. By this time, I felt I was ready to start doing personal projects using a blog. Being a painter, I knew creativity is not some talent that you either have or don’t. Creativity is born of experience and confidence in your skills because the possibilities of what can be done expand with the more you know.

Projects

What I have been upto?

Oscar Science!

Oscar Prediction Project

Predicting 2018's Win

Rather than relying on gut instinct, I've used the power of data science to help select the film that is most likely going home with the famous gold statue on March 4th.

Methodology

To understand my methodology, one must first understand a concept called “supervised learning.” Supervised learning is a machine learning concept that allows us to understand the relationship between one output and a lot of inputs. In this case, it helps us understand past outcomes (who won Best Picture previously and why) so we can better predict future outcomes (who will win this year and why). From critic ratings to performance at precursors, I used publicly available data set from Thinkful. This data would help inform our algorithm which was built using SciKit Learn, one of the most popular learning toolkits in the world.

Through evaluating multiple models, I determined that random forest classification provided the most accurate prediction of previous Oscar winners. When applied to Oscar winners and losers over the past 38 years, this approach made correct predictions in all but 1 year, 2017.

And for 2018 model predicted

And the Oscar Goes to..

The shape of water


See complete code on my

Working in STEM!

Predicting likelihood of working in STEM w.r.t. Gender and Race

Predicting likelihood of working in STEM w.r.t. Gender and Race

The American Community Survey (ACS) is the largest continuous household survey in the United States, providing a wealth of information about the economic, social, and demographic characteristics of persons, as well as housing characteristics. Primary Objective of this project is to examine as many data variables as possible to examine factors affecting individual’s income and to provide granular snapshot into the lives of many Americans.

The goal of this project is to build a model to predict likelihood of working in a STEM (Science, Technology, Engineering, and Math) career based on basic demographics: Age, sex, race, state of origin.

I've created two logistic regression models. The first one models the likelihood of an individual having a degree in science based on their demographics. The second one models the likelihood of an individual with a science degree getting a job in a STEM field, based on their demographics.

With these two models, I provide a high level overview of which demographic features are most likely to influence disparities at the level of education, and which are most likely to influence disparities at the level of hiring.

Findings

1. Underrepresentation of certain races exists at both the level of education and the level of career placement. However, the effect of underrepresentation in education seems to be much greater. In the first chart, Asian Americans more than twice as likely as Whites to have a degree in STEM, and Black, Puerto Rican, and Mexican Americans less than half as likely as Whites to have a degree in STEM. For an underrepresented minority who wants to work in STEM, the biggest part of the hurdle is getting a degree in STEM.

2. The gender gap in having STEM degrees exists much more in older Americans, while younger women seem to have closed the gap. However, in STEM careers, the gender gap appeared to be increasing. Although women are now earning STEM degrees at a rate equal to that of men, women with STEM degrees are still far less likely than their male counterparts to find careers in STEM.


See complete code on my

Chi-Town Crime

Tableau report of Chicago's Crime

Visualization of crimes in chicago using Tableau

Chicago has often been in the national headline for fluctuation in violent crimes. My interactive tableau report will help to visualize and confirm the uptick in crime rate and where/when a crime is happening in Chi-Town.

The Data

I used the Chicago crime dataset from Kaggle. The data included information such as date/time when and where crime has happened, type of crime and location coordinates.The dashboard consists of three main components -
1) Heatmap (map of Chicago that visualizes the density of crime in different areas by charge)
2) Time(Analysis over time)
3) Type of Offence/Crime

The goal of this project is to build a model to predict likelihood of working in a STEM (Science, Technology, Engineering, and Math) career based on basic demographics: Age, sex, race, state of origin.


See complete code on my

Experience

My education and experience

Education

ITM Data Management (3.8GPA)
Illinois Institute of Technology
Information Technology
University of Mumbai

Work Experience

Livongo Health Inc


- Utilize medical data feeds to interpret health signals and support downstream processes.
- Creating automation of various control systems and data operations in python to enhance file processing and data integrity checks. Designing and managing data processing and analytics tools/services such as Metabase, Redshift, AWS, Jenkins Env, Docker containers.
- Creating data visualizations such as auditing proprietary algorithms, client - program statistics, data quality metrics that expose insights to drive business decisions.

Illinois Institute of Technology, Chicago


- Participated in all in-house analytical initiatives, managed and developed 4 departments within Illinois Institute of Technology's - School of Applied Technology and worked cross-functionally with Director of Marketing Development at SAT to provide routine support.
- Provide backend engineering support to maintain Illinois Tech's ETL.
- Developed analytical models and comprehensive reports that enabled management team with the decision making.
- Enhanced usability by reducing bounce-rate using recommender systems(User-based collaborative filtering) to improve graduate bulletin.
- Led marketing initiative including web development, marketing campaign, flyer and video creation resulted in obtaining 250,000 + followers engagement in a senior week.

Illinois Institute of Technology - ITM Dept


- Teaching Assistant to Dr. C. Robert (Bob) Carlson (Dean - IIT School of Applied Technology, Chair - Department of Information Technology and Management) for ITMT 531 Object-Oriented System Analysis, Modeling, and Design
- Understand objectives and learning goals the professor has set for the students to communicate course materials.
- Schedule a regular TA hour (2 hours per week, identify critical information from lectures or readings and then elaborate on it to help students understand the material.
- Grading submissions

Skills

Questions, Questions, Questions

My graduate school study centered around the areas of OOP, Advance data analytics & warehousing, automation and backend engineering. It was the two years I spent at Illinois Tech that instilled in me an appreciation and excitement for the process of scientific research and the underlying mechanics; hypothesis formulation, study design, statistical programming, data collection, data analysis, and result presentation.

Object Oriented Programming:


Machine Learning:

Supervised learning

Data Warehouse:

Tools - Redshift, Metabase, AWS

Reporting/ BI: