Applied Data Science with Python

Please login or register to see price

Please Register Now

Applied Data Science with Python | Coursera

About This Specialization

The 5 courses in this University of Michigan specialization introduce learners to data science through the python programming language. This skills-based specialization is intended for learners who have basic a python or programming background, and want to apply statistical, machine learning, information visualization, text analysis, and social network analysis techniques through popular python toolkits such as pandas, matplotlib, scikit-learn, nltk, and networkx to gain insight into their data.

Introduction to Data Science in Python (course 1), Applied Plotting, Charting & Data Representation in Python (course 2), and Applied Machine Learning in Python (course 3) should be taken in order and prior to any other course in the specialization. After completing those, courses 4 and 5 can be taken in any order. All 5 are required to earn a certificate.

Created by:

courses

5 courses

Follow the suggested order or choose your own.

projects

Projects

Designed to help you practice and apply the skills you learn.

certificates

Certificates

Highlight your new skills on your resume or LinkedIn.

COURSE 1

Introduction to Data Science in Python

Current session: Jun 5 — Jul 10.
Subtitles
English

About the Course

This course will introduce the learner to the basics of the python programming environment, including how to download and install python, expected fundamental python programming techniques, and how to find help with python programming questions. The course will also introduce data manipulation and cleaning techniques using the popular python pandas data science library and introduce the abstraction of the DataFrame as the central data structure for data analysis. The course will end with a statistics primer, showing how various statistical measures can be applied to pandas DataFrames. By the end of the course, students will be able to take tabular data, clean it,  manipulate it, and run basic inferential statistical analyses. This course should be taken before any of the other Applied Data Science with Python courses: Applied Plotting, Charting & Data Representation in Python, Applied Machine Learning in Python, Applied Text Mining in Python, Applied Social Network Analysis in Python.
Hide Details

Show or hide details about course Introduction to Data Science in Python


WEEK 1
Week 1
In this week you’ll get an introduction to the field of data science, review common Python functionality and features which data scientists use, and be introduced to the Coursera Jupyter Notebook for the lectures. All of the course information on grading, prerequisites, and expectations are on the course syllabus, and you can find more information about the Jupyter Notebooks on our Course Resources page.

 

Video · Introduction to Specialization

 

Reading · Syllabus

 

Video · About the Professor: Christopher Brooks

 

Video · Data Science

 

Reading · 50 years of Data Science, David Donoho (optional)

 

Video · The Coursera Jupyter Notebook System

 

Other · Week 1 Lectures Jupyter Notebook

 

Video · Python Functions

 

Video · Python Types and Sequences

 

Video · Python More on Strings

 

Video · Python Demonstration: Reading and Writing CSV files

 

Video · Python Dates and Times

 

Video · Advanced Python Objects, map()

 

Video · Advanced Python Lambda and List Comprehensions

 

Video · Advanced Python Demonstration: The Numerical Python Library (NumPy)

 

Reading · Week 1 Slides

 

Quiz · Week One Quiz

WEEK 2
Week 2
In this week of the course you’ll learn the fundamentals of one of the most important toolkits Python has for data cleaning and processing — pandas. You’ll learn how to read in data into DataFrame structures, how to query these structures, and the details about such structures are indexed. The module ends with a programming assignment and a discussion question.

 

Video · Introduction

 

Other · Week 2 Lectures Jupyter Notebook

 

Video · The Series Data Structure

 

Video · Querying a Series

 

Video · The DataFrame Data Structure

 

Video · DataFrame Indexing and Loading

 

Video · Querying a DataFrame

 

Video · Indexing Dataframes

 

Video · Missing Values

 

Other · The Ethics of Using Hacked Data

 

Reading · Week 2 Slides

 

Other · Assignment 2

 

Programming Assignment · Assignment 2 Submission

WEEK 3
Week 3
In this week you’ll deepen your understanding of the python pandas library by learning how to merge DataFrames, generate summary tables, group data into logical pieces, and manipulate dates. We’ll also refresh your understanding of scales of data, and discuss issues with creating metrics for analysis. The week ends with a more significant programming assignment.

 

Other · Week 3 Lectures Jupyter Notebook

 

Video · Merging Dataframes

 

Video · Pandas Idioms

 

Video · Group by

 

Video · Scales

 

Video · Pivot Tables

 

Video · Date Functionality

 

Other · Goodhart’s Law

 

Reading · Week 3 Slides

 

Other · Assignment 3

 

Programming Assignment · Assignment 3 Submission

WEEK 4
Week 4
In this week of the course you’ll be introduced to a variety of statistical techniques such a distributions, sampling and t-tests. The majority of the week will be dedicated to your course project, where you’ll engage in a real-world data cleaning activity and provide evidence for (or against!) a given hypothesis. This project is suitable for a data science portfolio, and will test your knowledge of cleaning, merging, manipulating, and test for significance in data. The week ends with two discussions of science and the rise of the fourth paradigm — data driven discovery.

 

Other · Week 4 Lectures Jupyter Notebook

 

Video · Introduction

 

Video · Distributions

 

Video · More Distributions

 

Video · Hypothesis Testing in Python

 

Other · The End of Theory

 

Other · Science Isn’t Broken: p-hacking activity

 

Reading · Week 4 Slides

 

Other · Assignment 4 – Project

 

Programming Assignment · Assignment 4 Submission

COURSE 2

Applied Plotting, Charting & Data Representation in Python

Subtitles
English

About the Course

This course will introduce the learner to information visualization basics, with a focus on reporting and charting using the matplotlib library. The course will start with a design and information literacy perspective, touching on what makes a good and bad visualization, and what statistical measures translate into in terms of visualizations. The second week will focus on the technology used to make visualizations in python, matplotlib, and introduce users to best practices when creating basic charts and how to realize design decisions in the framework. The third week will describe the gamut of functionality available in matplotlib, and demonstrate a variety of basic statistical charts helping learners to identify when a particular method is good for a particular problem. The course will end with a discussion of other forms of structuring and visualizing data. This course should be taken after Introduction to Data Science in Python and before the remainder of the Applied Data Science with Python courses: Applied Machine Learning in Python, Applied Text Mining in Python, and Applied Social Network Analysis in Python.
Hide Details

Show or hide details about course Applied Plotting, Charting & Data Representation in Python


WEEK 1
Module 1: Principles of Information Visualization
In this module, you will get an introduction to principles of information visualization. We will be introduced to tools for thinking about design and graphical heuristics for thinking about creating effective visualizations. All of the course information on grading, prerequisites, and expectations are on the course syllabus, which is included in this module.

 

Video · Introduction

 

Reading · Syllabus

 

Video · About the Professor: Christopher Brooks

 

Video · Tools for Thinking about Design (Alberto Cairo)

 

Other · Hands-on Visualization Wheel

 

Video · Graphical heuristics: Data-ink ratio (Edward Tufte)

 

Reading · Dark Horse Analytics (Optional)

 

Video · Graphical heuristics: Chart junk (Edward Tufte)

 

Reading · Useful Junk?: The Effects of Visual Embellishment on Comprehension and Memorability of Charts

 

Video · Graphical heuristics: Lie Factor and Spark Lines (Edward Tufte)

 

Video · The Truthful Art (Alberto Cairo)

 

Other · Must a visual be enlightening?

 

Reading · Graphics Lies, Misleading Visuals

 

Peer Review · Graphics Lies, Misleading Visuals

WEEK 2
Module 2: Basic Charting
In this module, you will delve into basic charting. For this week’s assignment, you will work with real world CSV weather data. You will manipulate the data to display the minimum and maximum temperature for a range of dates and demonstrate that you know how to create a line graph using matplotlib. Additionally, you will demonstrate the procedure of composite charts, by overlaying a scatter plot of record breaking data for a given year.

 

Other · Module 2 Jupyter Notebook

 

Video · Introduction

 

Video · Matplotlib Architecture

 

Reading · Matplotlib

 

Reading · Ten Simple Rules for Better Figures

 

Video · Basic Plotting with Matplotlib

 

Video · Scatterplots

 

Video · Line Plots

 

Video · Bar Charts

 

Video · Dejunkifying a Plot

 

Other · Plotting Weather Patterns

 

Peer Review · Plotting Weather Patterns

WEEK 3
Module 3: Charting Fundamentals
In this module you will explore charting fundamentals. For this week’s assignment you will work to implement a new visualization technique based on academic research. This assignment is flexible and you can address it using a variety of difficulties – from an easy static image to an interactive chart where users can set ranges of values to be used.

 

Other · Module 3 Jupyter Notebook

 

Video · Subplots

 

Video · Histograms

 

Reading · Selecting the Number of Bins in a Histogram: A Decision Theoretic Approach (Optional)

 

Video · Box Plots

 

Video · Heatmaps

 

Video · Animation

 

Video · Interactivity

 

Other · Practice Assignment: Understanding Distributions Through Sampling

 

Practice Peer Review · Practice Assignment: Understanding Distributions Through Sampling

 

Other · Building a Custom Visualization

 

Reading · Assignment Reading

 

Peer Review · Building a Custom Visualization

WEEK 4
Module 4: Applied Visualizations
In this module, then everything starts to come together. Your final assignment is entitled “Becoming a Data Scientist.” This assignment requires that you identify at least two publicly accessible datasets from the same region that are consistent across a meaningful dimension. You will state a research question that can be answered using these data sets and then create a visual using matplotlib that addresses your stated research question. You will then be asked to justify how your visual addresses your research question.

 

Other · Module 4 Jupyter Notebook

 

Video · Plotting with Pandas

 

Video · Seaborn

 

Reading · Spurious Correlations

 

Video · Becoming an Independent Data Scientist

 

Other · Project Description

 

Peer Review · Becoming an Independent Data Scientist

COURSE 3

Applied Machine Learning in Python

Subtitles
English

About the Course

This course will introduce the learner to applied machine learning, focusing more on the techniques and methods than on the statistics behind these methods. The course will start with a discussion of how machine learning is different than descriptive statistics, and introduce the scikit learn toolkit. The issue of dimensionality of data will be discussed, and the task of clustering data, as well as evaluating those clusters, will be tackled. Supervised approaches for creating predictive models will be described, and learners will be able to apply the scikit learn predictive modelling methods while understanding process issues related to data generalizability (e.g. cross validation, overfitting). The course will end with a look at more advanced techniques, such as building ensembles, and practical limitations of predictive models. By the end of this course, students will be able to identify the difference between a supervised (classification) and unsupervised (clustering) technique, identify which technique they need to apply for a particular dataset and need, engineer features to meet that need, and write python code to carry out an analysis. This course should be taken after Introduction to Data Science in Python and Applied Plotting, Charting & Data Representation in Python and before Applied Text Mining in Python and Applied Social Analysis in Python.
Hide Details

Show or hide details about course Applied Machine Learning in Python


WEEK 1
Module 1: Fundamentals of Machine Learning – Intro to SciKit Learn
This module introduces basic machine learning concepts, tasks, and workflow using an example classification problem based on the K-nearest neighbors method, and implemented using the scikit-learn library.

 

Reading · Course Syllabus

 

Video · Introduction

 

Video · Key Concepts in Machine Learning

 

Video · Python Tools for Machine Learning

 

Other · Module 1 Notebook

 

Video · An Example Machine Learning Problem

 

Video · Examining the Data

 

Video · K-Nearest Neighbors Classification

 

Reading · Zachary Lipton: The Foundations of Algorithmic Bias (optional)

 

Quiz · Module 1 Quiz

 

Other · Assignment 1

 

Programming Assignment · Assignment 1 Submission

WEEK 2
Module 2: Supervised Machine Learning – Part 1
This module delves into a wider variety of supervised learning methods for both classification and regression, learning about the connection between model complexity and generalization performance, the importance of proper feature scaling, and how to control model complexity by applying techniques like regularization to avoid overfitting. In addition to k-nearest neighbors, this week covers linear regression (least-squares, ridge, lasso, and polynomial regression), logistic regression, support vector machines, the use of cross-validation for model evaluation, and decision trees.

 

Other · Module 2 Notebook

 

Video · Introduction to Supervised Machine Learning

 

Video · Overfitting and Underfitting

 

Video · Supervised Learning: Datasets

 

Video · K-Nearest Neighbors: Classification and Regression

 

Video · Linear Regression: Least-Squares

 

Video · Linear Regression: Ridge, Lasso, and Polynomial Regression

 

Video · Logistic Regression

 

Video · Linear Classifiers: Support Vector Machines

 

Video · Multi-Class Classification

 

Video · Kernalized Support Vector Machines

 

Video · Cross-Validation

 

Video · Decision Trees

 

Reading · A Few Useful Things to Know about Machine Learning

 

Reading · Ed Yong: Genetic Test for Autism Refuted (optional)

 

Quiz · Module 2 Quiz

 

Other · Classifier Visualization Playspace

 

Other · Assignment 2

 

Programming Assignment · Assignment 2 Submission

WEEK 3
Module 3: Evaluation
This module covers evaluation and model selection methods that you can use to help understand and optimize the performance of your machine learning models.

 

Other · Module 3 Notebook

 

Video · Model Evaluation & Selection

 

Video · Confusion Matrices & Basic Evaluation Metrics

 

Video · Classifier Decision Functions

 

Video · Precision-recall and ROC curves

 

Video · Multi-Class Evaluation

 

Video · Regression Evaluation

 

Reading · Practical Guide to Controlled Experiments on the Web (optional)

 

Video · Model Selection: Optimizing Classifiers for Different Evaluation Metrics

 

Quiz · Module 3 Quiz

 

Other · Assignment 3

 

Programming Assignment · Assignment 3 Submission

WEEK 4
Module 4: Supervised Machine Learning – Part 2
This module covers more advanced supervised learning methods that include ensembles of trees (random forests, gradient boosted trees), and neural networks (with an optional summary on deep learning). You will also learn about the critical problem of data leakage in machine learning and how to detect and avoid it.

 

Other · Module 4 Notebook

 

Video · Introduction

 

Video · Naive Bayes Classifiers

 

Video · Random Forests

 

Video · Gradient Boosted Decision Trees

 

Video · Neural Networks

 

Reading · Neural Networks Made Easy (optional)

 

Reading · Play with Neural Networks: TensorFlow Playground (optional)

 

Video · Deep Learning (Optional)

 

Reading · Deep Learning in a Nutshell: Core Concepts (optional)

 

Reading · Assisting Pathologists in Detecting Cancer with Deep Learning (optional)

 

Video · Data Leakage

 

Reading · The Treachery of Leakage (optional)

 

Reading · Leakage in Data Mining: Formulation, Detection, and Avoidance (optional)

 

Reading · Data Leakage Example: The ICML 2013 Whale Challenge (optional)

 

Reading · Rules of Machine Learning: Best Practices for ML Engineering (optional)

 

Quiz · Module 4 Quiz

 

Other · Assignment 4

 

Programming Assignment · Assignment 4 Submission

 

Other · Unsupervised Learning Notebook

 

Video · Introduction

 

Video · Dimensionality Reduction and Manifold Learning

 

Video · Clustering

 

Reading · How to Use t-SNE Effectively

 

Reading · How Machines Make Sense of Big Data: an Introduction to Clustering Algorithms

 

Video · Conclusion

Applied Text Mining in Python

Subtitles
English

About the Course

This course will introduce the learner to text mining and text manipulation basics. The course begins with an understanding of how text is handled by python, the structure of text both to the machine and to humans, and an overview of the nltk framework for manipulating text. The second week focuses on common manipulation needs, including regular expressions (searching for text), cleaning text, and preparing text for use by machine learning processes. The third week will apply basic natural language processing methods to text, and demonstrate how text classification is accomplished. The final week will explore more advanced methods for detecting the topics in documents and grouping them by similarity (topic modelling). This course should be taken after: Introduction to Data Science in Python, Applied Plotting, Charting & Data Representation in Python, and Applied Machine Learning in Python.

COURSE 5

Applied Social Network Analysis in Python

Subtitles
English

About the Course

This course will introduce the learner to network modelling through the networkx toolset. Used to model knowledge graphs and physical and virtual networks, the lens will be social network analysis. The course begins with an understanding of what network modelling is (graph theory) and motivations for why we might model phenomena as networks. The second week introduces the networkx library and discusses how to build and visualize networks. The third week will describe metrics as they relate to the networks and demonstrate how these metrics can be applied to graph structures. The final week will explore the social networking analysis workflow, from problem identification through to generation of insight. This course should be taken after: Introduction to Data Science in Python, Applied Plotting, Charting & Data Representation in Python, and Applied Machine Learning in Python.

Creators

  • University of Michigan

    Michigan’s academic vigor offers excellence across disciplines and around the globe. The University is recognized as a leader in higher education due to the outstanding quality of its 19 schools and colleges, internationally recognized faculty, and departments with 250 degree programs.

    The mission of the University of Michigan is to serve the people of Michigan and the world through preeminence in creating, communicating, preserving and applying knowledge, art, and academic values, and in developing leaders and citizens who will challenge the present and enrich the future.

  • Christopher Brooks

    Christopher Brooks

  • Kevyn Collins-Thompson

    Kevyn Collins-Thompson

    Associate Professor
  • Daniel Romero

    Daniel Romero

    Assistant Professor
  • V. G. Vinod Vydiswaran

    V. G. Vinod Vydiswaran

    Assistant Professor

Related Products

7 months ago
8
0
- 100% Business Foundations Specialization PMYP
Please login or register to see price

Business Foundations Specialization PMYP

7 months ago
33
0
- 100% Financial Management Specialization PMYP
Please login or register to see price

Financial Management Specialization PMYP

7 months ago
56
0
Reset Password