By Natassha Selvaraj, Student in CS majoring in Data Science.
As a beginner looking to break into the data science industry, it’s easy to get overwhelmed with all the information presented to you. There are hundreds of data science courses out there, and it is difficult to know where to start.
When I decided to teach myself data science just a year ago, I remember feeling really lost because I didn’t know where to start. I saw advertisements for machine learning courses that promised to make me an expert in three days. I read articles insisting it wasn’t possible to become a data scientist unless I had a Master’s degree in mathematics and a Ph.D. in statistics. There was just too much information out there and so many conflicting opinions.
I finally managed to create my own data science roadmap and teach myself programming and machine learning. I managed to break into the industry and land a data science job.
Every day, at least one person asks me how I did this, how to learn data science from scratch, and land a job in the industry. I did some research and compiled a list of online courses you can take to learn data science. The syllabi of these courses are good and will give you a strong foundation in programming, SQL, and machine learning. I use almost all the concepts taught in these courses during my day job as a data scientist.
If you want to learn data science, you first need to learn how to code. If you have no prior programming experience, I suggest starting out with Python.
There is an abundance of resources on the Internet that teach you Python programming, some of which include:
This is a 5-course specialization that will teach you Python from scratch. The first course in the specialization is called Programming for Everybody. In this course, you will learn the very basics of Python — syntax, conditional statements, iteration, functions, and variables.
This course doesn’t assume any pre-requisite, and you don’t need to come from a technical or mathematical background to get started with this course.
The next course in the specialization will teach you data structures. You will learn how to read data from files and manipulate data structures like lists and dictionaries.
The third course in the specialization teaches you to use Python to access web data. You will learn to use APIs and extract data from websites and then process this data with Python. You will also learn to extract data from strings and clean data using regular expressions.
Next, you will learn to access and manipulate databases with Python. You will learn to work with SQL databases with a Python library called SQLite3. No prior SQL or database experience is required to take this course. You will learn everything from scratch.
The final course is a capstone project. You will utilize all the concepts learned in the other courses and build an end-to-end project in the capstone. If you pass your capstone project, you will get a course certificate.
The biggest upside of this course is that it teaches you a lot of data collection and storage techniques that are essential for a data scientist to know.
Many other Python and data science courses skip over these topics, and you end up with little to no knowledge of how to use APIs or access web data.
This introductory Python course is broken up into four sections — Python Basics, Python Lists, Functions, and Numpy.
This course covers all the basics of Python, including variables, mathematical operations, list manipulation, and functions.
It also teaches you the basics of a library called Numpy, which is often used by data scientists to manipulate arrays.
The Python Basics section of this course is free, so you can try this portion of the course first to see if you’d enjoy it.
After completing the introductory Python course, you can take this intermediate level Python course on DataCamp.
This course will teach you to create visualizations in Python, manipulate dictionaries and lists, work with libraries like Pandas, and filter data frames using logic.
The first section in this specialization — data visualization with Matplotlib is free. You can try this out before deciding to get the entire course.
A major advantage of this course is that it teaches you Python for data science. It takes you through data analysis libraries like Pandas and Numpy, along with visualizations like Matplotlib.
The biggest piece of advice I’d give aspiring data scientists is to learn SQL. I never thought of SQL as an important part of data science. However, when I did my first data science internship, most of the work I did involved knowledge of data manipulation with SQL.
To learn SQL, I suggest taking the SQL for Data Science course on Coursera.
This is a 4-week course and assumes no prior database or programming knowledge. The first section of this course starts with data selection and retrieval with SQL.
Then, you will learn to use operators in SQL to filter data. As a data scientist/analyst, filtering data based on client requirements is something I do on a daily basis, so the content of this course is really important to understand.
In the next course, you will learn how joins work in SQL. You will learn to link multiple databases to each other. This is a very powerful technique. I deal with large databases on a daily basis and very often need to use joins to merge them together.
By now, you should have learned the basics of programming. You should also have an understanding of data analysis using libraries like Numpy and Pandas, along with data visualization using libraries like Matplotlib.
Now, you can step into the territory of machine learning.
This course is a part of the IBM Data Science specialization. You can take it as a standalone course and get a certificate for this course alone, and you don’t have to complete the entire specialization.
This course will provide you with a solid understanding of machine learning algorithms. You will learn to build models to solve supervised machine learning problems like regression and classification. You will also learn unsupervised machine algorithms like hierarchical clustering.
A huge advantage of taking this course over Andrew Ng’s machine learning specialization is that this course is taught completely in Python.
This course also has a final capstone project you need to pass before getting a certificate.
Datacamp’s machine learning track is broken up into multiple separate courses — supervised machine learning, unsupervised machine learning, linear classifiers, and deep learning.
I suggest taking the supervised machine learning course first. The first section of this course is free, so try it out and see if the content is useful. If you enjoy it, you can consider enrolling for the machine learning track.
Most machine learning courses online only cover the basics of different algorithms. A major advantage of this Datacamp track is that it covers topics like hyperparameter tuning and building pipelines.
When I took my first data science course on Udemy, I had a lot of gaps in my knowledge because I didn’t understand topics like parameter tuning and dimensionality reduction. It took me a long time to find the right resources to bridge the gap in my learning.
The content of this Datacamp machine learning track seems extremely comprehensive and covers a lot of ground that isn’t usually taught in other courses.
The list of courses mentioned above will provide you with a very strong foundation in data manipulation and machine learning. However, to really grow as a data scientist, you will need to go beyond these courses.
Start working on data science projects during your free time. Work on building real-life applications based on the concepts learned in these courses. You can go on sites like Kaggle and gain access to publicly available datasets and build machine learning algorithms to make predictions on these datasets.
Taking these courses will equip you with the necessary skill set you need to become a data scientist. You will then need to practice these skills and hone them by working on projects.
This article contains affiliate links. This means that if you click on it and choose to buy a course I linked above, a small portion of your subscription fee will go to me. As a creator, this helps me grow and continue to create content like this. However, I only recommend courses I think are good. The syllabi of the courses recommended above are very closely aligned to the work I do everyday as a data scientist. These are courses I recommend to people who ask me for tips to break into the data industry, and I do believe they will be useful in your data science journey.
Thanks for your support!
Bio: Natassha Selvaraj is pursuing a degree in computer science with a major in data science. Natassha’s interests are in the field of machine learning, having worked on a variety of projects in this domain.