Hello I'm Max Duong

"My life is already a masterpiece
By believing no constraint on what I can be."

Working as a Data Analyst Data Scientist Data Engineer Software Engineer

Source: Laparadadigital

1111111111111111

Max's Portfolio

A list of side projects I have done so far.

I hope you guys find them interesting.

I'm always open to constructive feedbacks, and appreciate your time and effort.

Download Resume

Website Development

Popcorn

Open the website

During the time as an intern at Popcorn, I helped upgrade the website's server and install plugins, reducing loading time by almost two-thirds of the original latency time of 9 seconds.

On top of that, I was also in charge of the UI design, in which I troubleshot frontend problems to fix the site's presentation across browsers, such as Chrome, Firefox, Safari, CocCoc.

The technologies included: HTML, CSS, Bootstrap, PHP.


Reading-Amazin

Open the website , Open the Github

This is the side project about bookstore e-commerce website, which replicates the industry use case. I was the team leader, who was responsible for conducting research on the technology, and participated in the whole software development process from design to implementation and delivery.

I developed software functionalities to ensure the user journey is not interrupted. Collaborating with peers to perform error analysis and make improvements based on feedback.

The technologies included: HTML, CSS, JS, C#, Entity Framework Core, Microsoft SQL Server.

Data Analytics

UK E-commerce Data

Open the Kaggle notebook

This is a transnational data set that contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion gifts.

Data cleaning and Feature engineering: remove outliers, extract new features from InvoiceDate such as date, day, month, year, hour, day of the week.

The report is built to answer the following business questions:

  1. 1. Who, customerID, brings the most revenue?
  2. 2. Who, customerID, buys the most in terms of quantity?
  3. 3. Who, customerID, is likely to return the product?
  4. 4. Which item is bought most and least?
  5. 5. Which country bring most revenue in total and average?
  6. 6. Which month we sell out most and least?
  7. 7. What time people tend to buy our product?
  8. 8. Which day of the week people tend to visit and purchase stuff?
  9. 9. Are there any relationships between Repeat Customers and All Customers over a year?
  10. 10. What is the most trending of some items?
  11. 11. Top 10 Reordered Items?
  12. 12. What is the Mall's Cancellation Rate?
  13. 13. The revenue comes from repeat items or 1 item per month?
  14. 14. What are the frequent itemsets? By applying Association Rule to understand market basket
  15. 15. How many customer types, based on the RFM model?
  16. 16. How to understand customer acquisition from marketing campaigns, by conducting Cohort analysis?


Mall Customer Data

Open the Kaggle notebook

Having customer data like Customer ID, Age, Gender, Annual income and Spending score, we need to understand customer behavior, then deliver insights to the marketing team and plan the strategy according.

The report is built to answer the following business questions:

  1. 1. What is the Age distribution?
  2. 2. What is the Annual Income distribution?
  3. 3. What is the Spending Score distribution?
  4. 4. What is the Gender distribution?
  5. 5. What is the difference in genders' spending score?
  6. 6. Indicating the interesting insights when plotting Age, Annual Income, Spending Score and Gender against each other?
  7. 7. How many customer types, based on the K-Mean clustering technique?


Sale Forecasting Data

Open the Kaggle notebook

Rossmann operates over 3,000 drug stores in 7 European countries. Store sales are influenced by many factors, including promotions, competition, school and state holidays, seasonality, and locality. Currently, Rossmann store managers are tasked with predicting their daily sales for up to six weeks in advance.

Approach to Sale Forecasting:

  1. 1. Data Exploration and Data Imputation: check missing value and fill with a median value
  2. 2. Data Visualization: plot feature distribution and ECDF
  3. 3. Data Manipulation: change data types, impute data, create new features,
  4. 4. Apply ARIMA, Prophet Facebook and other traditional machine learning models to extract insights

The report is built to answer the following business questions:

  1. 1. What is the sale trend over time?
  2. 2. What is the number of customers who come to stores across a week / month / year?
  3. 3. What is the number of sales of stores across a week / month / year?
  4. 4. Does promotion has a positive influence on the number of customers coming to stores and revenue overall?
  5. 5. What is the sale on holidays vs non-holiday?
  6. 6. Determine how significant features influence sale?
  7. 7. Can we closely predict the sale for the following months?


A/B Test Marketing Campaign Data

Open the Kaggle notebook

A fast-food chain plans to add a new item to its menu, however between 3 possible marketing campaigns they undecided the one that might bring the greatest effect on sales. Some basic data we have is MarketID, Market Size, Location, Store Age, Promotion, Week, Sales. We will find figure out which promotion/campaign is the best in this case.

Approach to A/B Test:

  1. 1. Data Exploration and Data Imputation: check missing value and fill with a median value
  2. 2. Data Visualization: plot feature distribution
  3. 3. Performing A/B test

The report is built to answer the following business questions:

  1. 1. What is the sale in each MarketID?
  2. 2. What is the sale in each MarketSize?
  3. 3. What is the interesting points in a sale from each pair of MarketID and MarketSize?
  4. 4. What is the relationship between MarketSize and Store Age?
  5. 5. How Promotions derived from MarketSize have an effect on Sales?
  6. 6. The longer we run the campaign, the more revenue it produces?
  7. 7. Performing A/B Test, by applying ANOVA(F-Test), Z-Test to point out the best campaign among 3?
  8. 8. Can we apply a machine learning model to extract important features from the dataset?

Machine Learning & Deep Learning

House Price Regression Prediction

Open the Kaggle notebook

Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. They might describe how many bedrooms, where it locates, the house age. Based on those criteria, we might want to know the approximate how much that much may cost.

Approach to Regression Prediction:

  1. 1. Data Exploration and Data Imputation: check missing value and fill with a median value
  2. 2. Data Visualization: plot feature distribution
  3. 3. Data Manipulation: change data types, apply statistic models, create new features, turn categorical data into dummy variables
  4. 4. Apply machine learning models to predict house price

The report is built to answer the following business questions:

  1. 1. Do we know which features affect the house price most by visualizing on graph only and by ML/DL models?
  2. 2. Can we predict closely the house price?
  3. 3. What should we do to have better accuracy in the future, by improving data quality or improving ML/DL algorithms?


Bank Subscription Classification Prediction

Open the Kaggle notebook

The data is related to direct marketing campaigns (phone calls) of a Portuguese banking institution. The goal is to predict if the client will subscribe to a term deposit.

Approach to Classification Prediction:

  1. 1. Data Exploration and Data Imputation: check missing value and fill with a median value
  2. 2. Data Visualization: plot feature distribution
  3. 3. Data Manipulation: change data types, apply statistic models, create new features, turn categorical data into dummy variables
  4. 4. Apply machine learning models to predict bank subscription

The report is built to answer the following business questions:

  1. 1. Do we know which features affect the bank subscription most by visualizing on graph only and by ML/DL models?
  2. 2. What is the accuracy score when applying ML/DL model, comparing it with a base model?
  3. 3. What should we do to have better accuracy in the future, by improving data quality or improving ML/DL algorithms?


Movie Review Sentiment Analysis

Open the Kaggle notebook

Given a corpus of movie reviews used for sentiment analysis. We will apply machine learning or deep learning algorithms on those reviews to let a machine distinguish negative, neutral, positive reviews.

Approach to NLP problem:

  1. 1. Data Exploration: check missing values
  2. 2. Data Visualization: plot labels distribution
  3. 3. Data Manipulation: change data types; remove html, emoji, url, number, non-alphabetic; convert text into a token
  4. 4. Apply ML/DL models to learn the movie review sentiment

The report is built to answer the following business questions:

  1. 1. Why do we need this model / project?
  2. 2. What is the accuracy score when applying ML/DL model, comparing it with a base model?
  3. 3. What should we do to have better accuracy in the future, by improving data quality or improving ML/DL algorithms?

Web Scrapping

Rotten Tomatoes Scrapper

Open the Kaggle notebook

Are you tired of manually copying and pasting values in a spreadsheet? Do you want to learn how to obtain interesting, real-time and even rare information from the internet with a simple script? When it comes to data science – more and more data comes from external sources, thus knowing how to extract and structure that data quickly is an essential skill that will set you apart in the job market.

Approach to Web Scraping:

  1. 1. Choose Library: BeautifulSoul
  2. 2. Find an element containing all data
  3. 3. Extract desired data from the web page
  4. 4. Data Manipulation: clean data, fill missing value
  5. 5. Store data in a structured form: Excel, Database, Powerpoint

The report is built to answer the following business questions:

  1. 1. Why we need Web scraping?
  2. 2. Is it legal or illegal to crawl data from an arbitrary website?
  3. 3. If we have a thousand of urls, how do we speed up the process?


Facebook Scrapper

Open the Github notebook

Supposed you are a marketer who is tired of manually copying and pasting values from the Facebook insights page in a spreadsheet? Knowing how to obtain data from the web page, and put it in a spreadsheet by Python is great and saves us a tremendous amount of time.

Approach to Web Scraping:

  1. 1. Choose Library: Selenium
  2. 2. Find an element containing all data
  3. 3. Extract desired data from the web page
  4. 4. Data Manipulation: clean data, fill missing value
  5. 5. Store data in a structured form: Excel, Database, Powerpoint

The report is built to answer the following business questions:

  1. 1. Why we need Web scraping?
  2. 2. Is it legal or illegal to crawl data from a Facebook page?
  3. 3. If we have a thousand of urls, how do we speed up the process?
  4. 4. How we handle the problems that Facebook is constantly changing HTML structure behind the scene?