391B Orchard Road #23-01 Ngee Ann City Tower B, Singapore 238874
+ 65 66381203

Certificate Associate in Data Science - Data Visualisation

Home»Certificate Associate in Data Science – Data Visualisation

Certificate Associate in Data Science - Data Visualisation

CADS Data Visualization

The human mind is often good at seeing patterns, trends, and outliers in visual representations. The large amount of data present in many data science problems can be analyzed using visualization techniques. Visualization is appropriate for a wide range of audiences, ranging from analysts, to upper-level management, to clientele. In this course, we present various visualization techniques and demonstrate how they are supported in Python.

Visualization is an important step in data analysis because it allows us to conceive of large datasets in practical and meaningful ways. We can look at small datasets of values and perhaps draw conclusions from the patterns we see, but this is an overwhelming and unreliable process. Using visualization tools helps us identify potential problems or unexpected data results, as well as construct meaningful interpretations of good data.

One example of the usefulness of data visualization comes with the presence of outliers. Visualizing data allows us to quickly see data results significantly outside of our expectations, and we can choose how to modify the data to build a clean and usable dataset. This process allows us to see errors quickly and deal with them before they become a problem later on. Additionally, visualization allows us to easily classify information and help analysts organize their inquiries in a manner best suited to their particular dataset.

Each type of visual expression lends itself to different types of data and data analysis purposes. One common purpose of data analysis is data classification. This involves determining which subset within a dataset a particular data value belongs to. This process may occur early in the data analysis process because breaking data apart into manageable and related pieces simplifies the analysis process. Often, classification is not the end goal but rather an important intermediary step before further analysis can be undertaken.

Regression Charts

Regression analysis is a complex and important form of data analysis. It involves studying relationships between independent and dependent variables, as well as multiple independent variables. This type of statistical analysis allows the analyst to identify ranges of acceptable or expected values and determine how individual values may fit into a larger dataset. Regression analysis is a significant part of machine learning

Clustering Charts

Clustering allows us to identify groups of data points within a particular set or class. While classification sorts data into similar types of datasets, clustering is concerned with the data within the set. For example, we may have a large dataset containing all feline species in the world, in the family Felidae. We could then classify these cats into two groups, Pantherinae (containing most larger cats) and Felinae (all other cats). Clustering would involve grouping subsets of similar cats within one of these classifications. For example, all tigers could be a cluster within the Pantherinae group. Sometimes, our data analysis requires that we extract specific types of information from our dataset. The process of selecting the data to extract is known as attribute selection or feature selection. This process helps analysts simplify the data models and allows us to overcome issues with redundant or irrelevant information within our dataset.

Learning Objectives

The objective of this course is to outline a diverse range of commonly used approaches to making and communicating decisions from data, using data visualization, clustering, and predictive analytics. Upon completion of the course, participants should be able to:

  • Understand the steps involved in the data science lifecycle and relate to where data visualization fits in
  • Reviews principle and methods for understanding and communicating data through the use of data visualizations
  • Demonstrate ways of visualizing single variables, the relationships between two or more variables
  • Demonstrate ways of visualizing groupings in the data, along with dynamic approaches to interacting with the data through graphical user interfaces
  • Demonstrate common visualization approaches to clustering data sets
  • Apply methods for determining the distance between observations and techniques for clustering observations

Who should attend

Data Analysts, Data Engineers, Data Science Enthusiasts, Business Analysts, Project Managers

Prerequisite

Foundational certificate in Big Data/Data Science

This course is meant for anyone who are comfortable developing applications in Python, and now want to enter the world of data science or wish to build intelligent applications. Aspiring data scientists with some understanding of the Python programming language will also find this course to be very helpful. If you are willing to build efficient data science applications and bring them in the enterprise environment without changing your existing python stack, this course is for you

Delivery Method

Mix of Instructor-led, case study driven and hands-on for select phases

H/w, S/w Reqd

Python, Pandas, Numpy, System with at least 2GB RAM and a Windows /Ubuntu/Mac OS X operating system

Duration

24 Hours (2 days Instructor led + 8 hours online learning)

Enroll Now
  • Course Name:Certificate Associate in Data Science – Data Visualisation
  • Location:Singapore
  • Duration:2 days classroom + 8 hours online
  • Exam Time: 60 minutes
  • Course Price: Call for price
  • Minimum requirements: Foundational Certificate in Programming

 ITPACS LogoITPACS Data Science Certification Road Map

Course contents

# Topic Method of Delivery
Day 1
1

1 – Introduction to Data Visualization

1.1 Overview

1.2 Definition

1.3 Preparation

1.3.1 Overview

1.3.2 Accessing Tabular Data

1.3.3 Accessing Unstructured Data

1.3.4 Understanding the Variables and Observations

1.3.5 Data Cleaning

1.3.6 Transformation

1.3.7 Variable Reduction

1.3.8 Segmentation

1.3.9 Preparing Data to Apply

1.4 Analysis

1.4.1 Data Mining Tasks

1.4.2 Optimization

1.4.3 Evaluation

1.4.4 Model Forensics

Instructor Led
2

2 – Data Visualization

2.1 Overview

2.2 Visualization Design Principles

2.2.1 General Principles

2.2.2 Graphics Design

2.2.3 Anatomy of a Graph

2.3 Tables

2.3.1 Simple Tables

2.3.2 Summary Tables

2.3.3 Two-Way Contingency Tables

2.3.4 Supertables

2.4 Univariate Data Visualization

2.4.1 Bar Chart

2.4.2 Histograms

2.4.3 Frequency Polygram

2.4.4 Box Plots

2.4.5 Dot Plot

2.4.6 Stem-and-Leaf Plot

2.4.7 Quantile Plot

2.4.8 Quantile—Quantile Plot

2.5 Bivariate Data Visualization

2.5.1 Scatterplot

2.6 Multivariate Data Visualization

2.6.1 Histogram Matrix

2.6.2 Scatterplot Matrix

2.6.3 Multiple Box Plot

2.6.4 Trellis Plot

2.7 Visualizing Groups

2.7.1 Dendrograms

2.7.2 Decision Trees

2.7.3 Cluster Image Maps

2.8 Dynamic Techniques

2.8.1 Overview

2.8.2 Data Brushing

2.8.3 Nearness Selection

2.8.4 Sorting and Rearranging

2.8.5 Searching and Filtering

 

Instructor Led
3

3 – CLUSTERING

3.1 Overview

3.2 Distance Measures

3.2.1 Overview

3.2.2 Numeric Distance Measures

3.2.3 Binary Distance Measures

3.2.4 Mixed Variables

3.2.5 Other Measures

3.3 Agglomerative Hierarchical Clustering

3.3.1 Overview

3.3.2 Single Linkage

3.3.3 Complete Linkage

3.3.4 Average Linkage

3.3.5 Other Methods

3.3.6 Selecting Groups

3.4 Partitioned-Based Clustering

3.4.1 Overview

3.4.2 k-Means

3.4.3 Worked Example

3.4.4 Miscellaneous Partitioned-Based Clustering

Instructor Led
 

Case study

Hands-on session
Day 2
4

 

4 – PREDICTIVE ANALYTICS

4.1 Overview

4.1.1 Predictive Modeling

4.1.2 Testing Model Accuracy

4.1.3 Evaluating Regression Models’ Predictive Accuracy

4.1.4 Evaluating Classification Models’ Predictive Accuracy

4.1.5 Evaluating Binary Models’ Predictive Accuracy

4.1.6 ROC Charts

4.1.7 Lift Chart

4.2 Principal Component Analysis

4.2.1 Overview

4.2.2 Principal Components

4.2.3 Generating Principal Components

4.2.4 Interpretation of Principal Components

4.3 Multiple Linear Regression

4.3.1 Overview

4.3.2 Generating Models

4.3.3 Prediction

4.3.4 Analysis of Residuals

4.3.5 Standard Error

4.3.6 Coefficient of Multiple Determination

4.3.7 Testing the Model Significance

4.3.8 Selecting and Transforming Variables

4.4 Discriminant Analysis

4.4.1 Overview

4.4.2 Discriminant Function

4.4.3 Discriminant Analysis Example

4.5 Logistic Regression

4.5.1 Overview

4.5.2 Logistic Regression Formula

4.5.3 Estimating Coefficients

4.5.4 Assessing and Optimizing Results

4.6 Naive Bayes Classifiers

4.6.1 Overview

4.6.2 Bayes Theorem and the Independence Assumption

4.6.3 Independence Assumption

4.6.4 Classification Process

Instructor Led
 

Assignment

Online Self paced

Certification

  • Certificate Title: Certificate Associate in Data Science – Data Visualization
  • Certificate Awarding Body: ITPACS

About ITPACS

Information Technology Professional Accreditations and Certifications Society (ITPACS) is a non-profit organization focused on improving technology skills for the future. ITPACS offers associate level, professional level and leader certifications across 6 domains including data science, web development, mobile development, cyber security, IoT and blockchain. Applicants have to go through a exam eligibility process demonstrating their experience.

 

Certification Roadmap

CADS Data Visualization Outline

Eligibility

The Associate certification is catered to individuals with less than 1 year working experience in the field. This is ideal for newcomers starting out in the profession or those seeking to make an entry into the profession. Applicants are required to have completed the application process prior to taking the exam.

Styling Eligibility

Exam

  • Exam Format: Closed-book format.
    Questions: 30 multiple choice questions, coding exercises
    Passing Score: 65%
    Exam Duration: 60 minutes
    Proctored

  • Exam needs to be taken within 12 months from the exam voucher issue date

ITPACS Certification Training Road Map

 

Data Science

Data science is not a single science as much as it is a collection of various scientific disciplines integrated for the purpose of analyzing data. These disciplines include various statistical and mathematical techniques, including:

  • Computer science
  • Data engineering
  • Visualization
  • Domain-specific knowledge and approaches

With the advent of cheaper storage technology, more and more data has been collected and stored permitting previously unfeasible processing and analysis of data. With this analysis came the need for various techniques to make sense of the data. These large sets of data, when used to analyze data and identify trends and patterns, become known as big data.

The process of analyzing big data is not simple and evolves to the specialization of developers who were known as data scientists. Drawing upon a myriad of technologies and expertise, they are able to analyze data to solve problems that previously were either not envisioned or were too difficult to solve.

The various data science techniques that we will illustrate have been used to solve a variety of problems. Many of these techniques are motivated to achieve some economic gain, but they have also been used to solve many pressing social and environmental problems. Problem domains where these techniques have been used include finance, optimizing business processes, understanding customer needs, performing DNA analysis, foiling terrorist plots, and finding relationships between transactions to detect fraud, among many other data-intensive problems.

Data mining is a popular application area for data science. In this activity, large quantities of data are processed and analyzed to glean information about the dataset, to provide meaningful insights, and to develop meaningful conclusions and predictions. It has been used to analyze customer behavior, detecting relationships between what may appear to be unrelated events, and to make predictions about future behavior.

Machine learning is an important aspect of data science. This technique allows the computer to solve various problems without needing to be explicitly programmed. It has been used in self-driving cars, speech recognition, and in web searches. In data mining, the data is extracted and processed. With machine learning, computers use the data to take some sort of action.