Data Science

Big Data for Data Scientists

US Online

The big data course teaches data scientists the necessary skills to scale data science solutions. If you want to build portfolio projects that will help you stand out, up-skill yourself to become a senior data scientist, or have the necessary engineering skills to become a data science solution architect, this is the right course for you.

This course is suitable for learners who want to

  • work with large-scale ML problems
  • learn how to work in a cloud first environment
  • elevate data science portfolios via end-to-end big data analytics projects
Talk to our Advisor
Online Live
6 weeks
60 hours
Upcoming Start Date
Nov 12
Registration Deadline:
  November 12, 2024
View more start dates

About the Course

The Big Data for Data Scientists is a 6-week project-based course that teaches data scientists the necessary tools to work on large-scale data science problems. The entire course is built around an end-to-end real-time machine learning problem. Students will learn the most cutting-edge big data frameworks and tools such as AWS, Apache Spark, Amazon SageMaker, Databricks. Students will also learn how to train machine learning models at scale and deploy models at scale in real-time.


  • What you will learn
    • AWS Cloud
    • Big Data & Spark
    • Scaling ML with Spark ML
    • MLOps with SageMaker


  • Case-based learning with real-life datasets
    • AdTech – Fraud Detection
    • AdTech – CTR Prediction
    • Recommender Systems
    • Search Engine (Knowledge base)
    • Social Media Analytics

WeCloudData is the perfect place to grow your career


Wenle W., Senior Big Data Developer

After listening and comparing big data courses in different places in Toronto, I went to sign up for all of the WeCloudData courses right after Shaohua’s info session without any hesitation. He is not only very knowledgable and experienced but also teaches so clearly and methodologically, which is way beyond my expectations. I have finished Python and big data courses up far, both courses are well-organized, project-oriented all along. You can start by applying what you have learned, exploring your own tools from there, and also building up step by step as you learn more throughout the courses.

Whenever I got stuck, I could get all the help I need from the instructor, TA, and teammates, sometimes I also got motivated by other teams and pushed forward by my instructor to continue working on my project by doing in-class progress report and presentations. The things I have learned here and the final project presentation I did benefit me a lot by showing so much confidence in my big data interviews and helping me handing job offers. Quite often, I feel I know more than my interviewers!

Many thanks to WeCloudData, it is a great learning platform with a great instructor, TAs and classmates!!

Ranked #1 Data Training Program


Be ready for the new economy

WeCloudData programs are designed to be project-based. We not only cover essential theories, but also teach how to apply tools and platforms that are in high demand today. Our program curriculum is also highly adaptive to the latest market trends. 

Module 1
Introduction to AWS
Modern data scientists need to become familiar with cloud technologies. Most production data science solutions are implemented on private/public cloud. This course teaches the fundamentals of cloud computing for data scientists such as storage and compute. Students will learn how to work with AWS python SDKs to store and retrieve big data and know how to scale analytics solutions to more powerful cloud instances.
  • Understand the concepts of cloud computing
  • Become familiar with AWS’s data science and ML solution ecosystem
  • Understand different use cases of data science in the cloud
  • Learn how to launch EC2 instances/servers on AWS
  • Learn how to retrieve and store big data in AWS S3
  • AWS (Amazon Web Service)
  • Boto
  • S3
  • Cloud Storage
  • EC2
  • Cloud Instance
Module 2
Introduction to Big Data
Data scientists working for big tech, insurance, telecom, retail, and e-commerce industries are often dealing with large amounts of data. The size of the data can range from hundreds of gigabytes to hundreds of terabytes. In most of those cases, traditional servers and personal computers are not ideal tools and data scientists will need to learn big data tools such as Snowflake and Spark. This module introduces data science learners to the world of big data. We will give an overview of distributed systems and the big data landscape
  • Learn how to scale Pandas data processing pipelines
  • Learn when and how to scale ML workloads using Ray and Polars
  • Get familiar with the concepts of distributed systems
  • Learn the basic concepts of MapReduce
  • Ray
  • Polars
  • Pandas
  • Dask
  • Spark
Module 3
Big Data with Apache Spark
Apache Spark is one of the most popular big data framework for data scientists. In this module we will introduce the Spark DataFrame API to learners and teach you how to scale data processing using Spark SQL and Spark DataFrame. You will experience the performance of Spark on AWS EMR and Databricks.
  • Understand the Spark ecosystem
  • Learn how to read big data from S3 and Parquet
  • Learn how to process 50G to terabyte size datasets using Spark DataFrame
  • Spark SQL
  • Spark DataFrame
  • Apache Spark
  • Databricks
  • Parquet
  • Distributed Computing
  • MapReduce
Module 4
Scaling Machine Learning with Spark
This module teaches students how to train large-scale ML models using Spark. Students will learn the basics of distributed ML algorithms and how to train and tune models on data-parallel problems.
  • Know when to use Spark for machine learning
  • Learn how to train and tune machine learning models using Spark ML
  • Learn how to train and tune machine learning models using Ray Train
  • Understand the specific use cases of Spark in recommender systems
  • Understand the specific use cases of Spark in ad-tech
  • Understand the specific use cases of Spark in sentiment analysis and text classification
  • Spark ML
  • Distributed Machine Learning
  • Recommender System
  • Click-through Rate Prediction
  • CTR
  • Sentiment Analysis
  • Text Classification
Module 5
MLOps with SageMaker
This module introduces learners to the world of MLOps. Students will learn how to work with end-to-end platforms such as Amazon SageMaker to build, train, tune, and deploy machine learning models. This module will prepare students for a more advanced MLOps Engineer course.
  • Learn how to work with SageMaker Studio
  • Learn how to prepare data using SageMaker Wrangler
  • Learn how to train and deploy models using SageMaker
  • Learn how to monitor ML models using SageMaker
  • AWS SageMaker
  • SageMaker Wrangler
  • SageMaker Studio
  • SageMaker Endpoint

Learn from the best

We’ve brought together a team of highly skilled and experienced instructors to help you learn effectively. Our instructors have a passion for teaching and a wealth of real-world experiences in their respective fields, so you can be confident that you’re learning from the best.


Portfolio Experience Building

Making yourself hireable and stand out from the crowd by working on big data personal projects. Here’s what you will experience:

  • Choose a big data problem to focus on
  • Write a project proposal
  • Set up AWS infrastructure
  • Design the ML system diagram
  • Implement and deployment end-to-end ML solutions on AWS
  • Code review with your learning mentor
  • Present your portfolio project
  • Publish your work online

Upcoming Start Dates

Nov 12 -
 Dec 17
Online Live
$2,600.00 USD

Explore your personalized learning path

Big Data for Data Scientists
$2,600 USD
  • Case-based learning
  • Portfolio project mentoring
  • Flexible payment plan
Recommended Short Courses
$3,000 - $3,800 USD
  • Enrich your DS experience with advanced DE and AI skills
  • Get alumni discount for other DE, AI, and MLE courses
  • Short courses to consider after completing this course ⇩
Upgrade to Bootcamp
$6,000 - $12,000 USD
  • Upgrade to the DS or AI bootcamp and get alumni discount
  • Get extensive 1-1 career mentoring and job support
  • Get the flexibility to create your own bootcamp
Have Questions?

Start Learning With WeCloudOpen

WeCloudOpen is here to help you unlock your full potential in tech, with our free courses and workshop. Learn the fundamentals of coding and data, and become a proficient tech professional in no time!

WeCloudOpen Course

Our comprehensive courses on Python and SQL are the perfect way to start your journey into the world of tech. WeCloudOpen ensures you learn the basics without any hassles

WeCloudOpen Workshop

Our free workshops offer topics like Business Intelligence, Data Science, Data Engineering, DevOps, Machine Learning – allowing you to get a head start in tech career

student success

What our graduates are saying


M Chowdhury

I would highly recommend WeCloud Data to anyone who wants to learn practical/ applied Data Science and Big Data (Spark, Hadoop, AWS stack, Databases, SQL, NoSQL, Python, Machine Learning). Because I found the team of instructors very helpful. Shaohua is highly experienced in the field. The teaching style is very user-friendly where he breaks down difficult topics easy to understand.


Grace T.

The lectures have been amazing! Both instructors are awesome – it’s obvious that they are experts in this domain. They both have the ability to explain concepts in such a way that they can be understood. The slides are also very helpful and provide a solid reference point for the topics discussed thus far. Overall, very happy that I am taking this course and would definitely recommend to colleagues and friends.

Let WeCloud Accelerate Your Career in Tech

Have questions?

Want more details about this course? Unsure about which path to take? Apply now to reserve a spot or make an appointment with our learning advisor. 

Start learning with WeCloud Open

Join WeCloud Open and start learning today! We provide open courses, career guide, and learning resources. It’s a great way to start your career in tech!


Frequently asked questions about the bootcamp
Learners joining this course will need to have solid python programming skills. Some machine learning experience is also required since this course focuses on scalable machine learning. Some packages learners need to know include Python pandas, scikit-learn. Knowing some linux programming will be helpful but not a must have.
The Big Data course is very hands-on by design. Learners will start to work on a capstone project starting from the 3rd week. There’re lots of exercises that will keep learners busy. The lectures are also taught in a hands-on fashion. Learners will follow instructors and TAs to complete labs.
This big data course is designed to prepare students for data scientists jobs that require big data skills. It’s taught at an intermediate to advanced level and students will complete an end to end project. It will help you build awesome portfolio projects and stand out.
Learns will build an end-to-end big data pipeline to be added to their data science portfolio. You will be required to come up with your original ideas, create the services in AWS, load big data into the data lake, build a machine learning pipeline using Spark ML to solve a ML big data challenge. At the end of the project, you will be presenting your work to experienced instructors and the entire class to get feedback.
While both are big data related courses, this course is for data scientists who want to scale data science and machine learning solutions. It is not a course for data engineers who don’t work with machine learning.
Yes. We have labs on a weekly basis and students work with lab instructors and project mentors when they work on the capstone projects.
No, most of the “big data” processing in this course is done in the cloud. From day one students learn how to work with AWS and as long as you have good internet connection and has an AWS account you’ll be able to do the course.
Yes, you will need to buy AWS credits to take this course. We will mostly work with the free Databricks community account and use AWS free credits. It you go over the free credits limit then you’ll need to pay for the service. Learners shouldn’t expect spending more than $50 USD on AWS in this course.
Yes, learners need to have Python and machine learning knowledge. If you don’t have ML skills yet, we recommend you take WeCloudData’s machine learning short course first.
Yes, payment plans are available for this course. You can inquire about the payment plan by filling out the course inquiry form. Details are on the course package page once you got redirected after filling out the form.
Scholarship is available for bootcamp students. We do offer different kinds of discounts including alumni discount as well.
Some students have their employers cover the tuition. You can always ask your employers about it. We’re happy to provide the curriculum and enrolment letter.
View our Big Data for Data Scientists course package
View our Big Data for Data Scientists course package