Hello I'm Max Duong

"My life is already a masterpiece
By believing no constraint on what I can be."

Working as a Data Analyst Data Scientist Data Engineer Software Engineer

Source: anyaberkut

1111111111111111

Getting started with Data Science is not a piece of cake


Source: Medium

Jumping into a data field, in general, is tough, including me for sure. I want to tell you something either good or bad, which really depends on your perspective.

1. There is more than one way to have a job in data science!

The title is confusing, I know, just bear with me awhile. It’s honestly simple but easy to break into the field as complex and multifaceted as data science. To me, it is simply that we can easily search for online resources that guide us into the world of data, the problem is whether you have enough academic background and/or experience and/or domain knowledge. Especially since we're living in a time when the definition of data science is not even clarified yet. Speaking from my own experience, there is no single way to get started on the journey to becoming a data scientist.

2. What do we want to do?

Yes, the first and foremost question we need to ask is what we want to do in our life. We want to work as a data scientist. Great! Next, what specific area in data science intrigues us. There are plenty of options to choose from such as marketing, retail, finance, health care, logistics, and more. I guess you may feel overwhelmed. Don't give up yet, a long road is waiting ahead of us. Then what particular job title we find interesting and suits us, for example, business analysis, data analysis, data engineer, data scientist, machine learning engineer, and so on.

It’s frustrating to hear about those things above, as you do not know where you might fit in. Let’s break the problem into smaller chunks. First, identify which company you aspire to work for them. Do some research on job postings of that company and tailor your learning strategy to satisfy those requirements. To me, I don’t really have any particular dream company in mind. I mostly extract the key points from job posting websites like Indeed, Linkedin, Glassdoor, etc. I guess you better check out a few of them before continuing reading.

3. Job requirements

Great! I bet you’ll be likely to see the requirements including Python, R, SQL, AWS, Spark, Hadoop, Excel, Tableau, Qlik, and so on. Some of them that mention NLP, Computer Vision, Music pattern detection, state of art in machine learning may require a candidate to have a Master or Ph.D. in a targeted area of study. Next, a thing usually not clearly stated in the post is the power of domain knowledge, it means how much you know about the industry. Last but not least is to pay attention to soft skills. It sounds so trivial, isn’t it? And I had to pay for my ego during my last internship for my lack of communication skills. They certainly set us apart from the competition. To put it simply, if we can translate business problems into data science solutions and deliver data-driven insights back to business to improve user experience, we earn a huge advantage.

4. Let’s break the job requirements into digestible pieces

First, let’s talk about programming. If you have no coding experience, it’s ok, as I used to be like you. The best way to obtain this skill is to take some online programming courses. Python and R are two leading choices for data science in general. I preferably take Python over R as it’s applicable in data analytics and software development as well.

I recommend checking out this course:

Udemy - Python and Flask API

Second, SQL, standing for Sequel Query Language, is in high demand because all sized companies use it for data work. Once you get a hang of it, you’ll be able to store information in the database and retrieve data for the analysis duty. Take a look at the graph below, most data-related jobs require SQL skill. If you choose to become a data analyst, SQL plays an important part in your job, no other way around.

Source: Dataquest

I highly recommend the course:

Udemy - SQL for data analytics and business intelligence

Third, the job requirements need you to use tools like Excel, Tableau, Qlik. Those tools are used to visualize graphs and transfer the insights derived from the dataset to our seniors or colleagues. You might take those skills seriously.

I personally took this course to perform tasks during my intern at WeVenture:

Udemy - Tableau for Data Science

Fourth, learning machine learning and deep learning is a turning point for us to become Data Analysts. Yes, a Data Analyst, not a Data Scientist as some of us may expect. According to my experience and knowledge acquired from data seniors on Youtube, I realize that most Data Analysts understand at least Linear Regression and Binary Classification models (Logistic Classification or Logit). You may ask if you know some advanced techniques like deep learning and/or XGBoost, LightGBM, stack models, will you be a data scientist as the MOOCs’ authors declared their title’s courses. There are no right or wrong answers here, and the expectation of data science work is still unclear to many companies. I personally believe Data Scientists should be able to do more than that, for example, they’ll build and maintain data pipelines and deploy models to the server, then create APIs to collaborate with Frontend developers.

My experience acquired so far came from the following courses:

Udemy - Python for Data Science and Machine Learning

Udemy - Data Science Bootcamp

Udemy - Real-Life Data Science

Udemy - Machine Learning and Data Science zero to mastery

Udemy - Customer Segmentation and Analytics

Udemy - Computer Vision (CV)

Udemy - Natural Language Processing (NLP)

Udemy - Recommender Systems

Source: OnlineCourseBay

Sorry, I forgot to mention the importance of math foundation, which we need to feel comfortable with at least basic calculus, linear algebra, statistics. Math is important if we’re working with data in general.

You may need to brush up on this skill at Khan Academy as I previously did

Khan - Algebra

Fifth, AWS, Spark, Hadoop are skills required candidates to be able to work with big data, especially for the job title of Data Engineer, Data Scientist, Machine Learning Engineer. I think this option should be the last to consider because you may need to pay to use AWS services. Spark or Hadoop is not really useful when you only work with small amounts of data at the beginning of the journey. All I’m saying is subjective, don't take my words for it.

Sixth, some job postings require candidates to have at least a Master's or a Ph.D. degree to be considered. If you can take higher-quality education, just do it since not many people pursue this path. But the case doesn’t work for me at the moment of writing this blog. A Master's or Ph.D. degree is awesome to have without a doubt but doesn’t guarantee that I’ll have my desired job. That is the reason I pick online courses, coupled with side projects, to have a fantastic experience, which can be more effective down the road rather than having a higher degree.

Source: Rasmussen College

Seventh, as mentioned above Domain Knowledge and Soft Skills will set us apart from the competition. What I’m sharing with you comes from my professional experience, period! Currently, I have some sort of knowledge in the retail and marketing industry, where I can tell great stories out of a dataset. I familiarize myself with the retail industry through side projects I’ve done on Kaggle. Plus, I grasp marketing notions by being an intern at Popcorn. I can do it, so do you. Don’t be afraid of the vague and vast amount of information out there, all will be completed by taking the first step.

5. A thousand-mile journey starts with a single step!

The first step was said earlier in section 2, where we need to determine what we want to do. Next, we should think about our learning style, which could be listening, reading, acting. You know what the best is for you, right.

I don’t have a particular style. As I progress on the path, I have to change my style and follow the flow. When learning new skills, I prefer to listen to videos on Udemy or Youtube. Then I start to execute the code to see if it runs, and work backwards to understand the whole series. During my interns, reading and executing is a must as I need to solve problems instead of learning new skills.

No matter which style you choose, it could be listening to a data science podcast, or reading 2 data-related articles, or watching video tutorials every day, make sure to get your hands dirty through projects.

6. Project time!

With a concrete understanding of the course materials, it’s time to move on to greater projects. Find a topic that aligns with our interest (section 2). Working on things we love will provide us with confidence and Domain Knowledge that we can talk about at a new level. That is what I realized, and helps me when networking or interviewing with recruiters. You may ask where you can earn experience on real datasets, here is a list of websites I love:

Kaggle

UCI Machine Learning Repository

Data.gov

7. Get out there and get a job!

Thank you for staying with me till this point even if you had every reason to leave.

Now, we have a portfolio of projects that we’ve practiced and talk about in-depth. A series of videos/podcasts guide us to different aspects of data science life, which we’ve learned and built up our confidence as we’re able to take on the upcoming role. A strong GitHub and/or Kaggle profile paves the way for success, and our Linkedin profile is a great place to showcase our experience, our skills, our connections. All will contribute to standing out our resume to a recruiter and/or a hiring manager.

What are we waiting for? Let’s together do it. I’m doing it and receiving positive results, I hope my shares touch you guys and you’ll have a great ending.