391B Orchard Road #23-01 Ngee Ann City Tower B, Singapore 238874
+ 65 66381203

Certificate Associate in Data Science – Non-linear Supervised Learning Algorithms

Home»Certificate Associate in Data Science – Non-linear Supervised Learning Algorithms

Certificate Associate in Data Science – Non-linear Supervised Learning Algorithms

CADS Non-Linear Supervised Learning Algorithms

Models that do not make strong assumptions about the form of the mapping function are called nonparametric machine learning models. By not making assumptions, they are free to learn any functional form from the training data.

“Nonparametric methods are good when you have a lot of data and no prior knowledge, and when you don’t want to worry too much about choosing just the right features.” [Artificial Intelligence: A Modern Approach, page 757]

Nonparametric methods seek to best fit the training data in constructing the mapping function, whilst maintaining some ability to generalise to unseen data. As such, they are able to fit a large number of functional forms. An easy to understand nonparametric model is the k-nearest neighbors algorithm that makes predictions based on the k most similar training patterns for a new data instance. The method does not assume anything about the form of the mapping function other than patterns that are close are likely have a similar output variable.

Benefits of Non-parametric Machine Learning Algorithms

  • More Flexibility: Capable of fitting a large number of functional forms.
  • High Power: No assumptions (or weak assumptions) about the underlying function.
  • High Performance: Can result in higher performance models for prediction.

Limitations of Non-parametric Machine Learning Algorithms

  • More Training Data: Require a lot more training data to estimate the mapping function.
  • Low Efficiency: A lot slower to train as they often have far more parameters to train.
  • Overfitting: More of a risk to overfit the training data and it is harder to explain why specific predictions are made.

    The term “non-parametric” might sound a bit confusing at first: non-parametric does not mean that they have NO parameters! On the contrary, non-parametric models (can) become more and more complex with an increasing amount of data.

Learning Objectives

After completing this course, you should have the skills and be familiar with the following topics

  • Explain the difference between parametric and non-parametric models
  • Use Support Vector Machines to efficiently perform a nonlinear classification using the kernel trick, implicitly mapping inputs into high-dimensional feature spaces.
  • Explain instance-based learning, or lazy learning
  • Apply K-NN classification algorithm to determine class membership
  • Apply Decision trees for splitting a data set into branch-like segments. Derive one or more decision rules that describe the relationships between inputs and targets
  • Appy Random Forest

 

Who should attend

Data Analysts, Data Engineers, Data Science Enthusiasts, Business Analysts, Project Managers

Prerequisite

Foundational certificate in Big Data/Data Science

This course is meant for anyone who are comfortable developing applications in Python, and now want to enter the world of data science or wish to build intelligent applications. Aspiring data scientists with some understanding of the Python programming language will also find this course to be very helpful. If you are willing to build efficient data science applications and bring them in the enterprise environment without changing your existing python stack, this course is for you

Delivery Method

Mix of Instructor-led, case study driven and hands-on for select phases

H/w, S/w Reqd

Python, Pandas, Numpy, System with at least 2GB RAM and a Windows /Ubuntu/Mac OS X operating system

Duration

24 Hours (2 days Instructor led + 8 hours online learning)

Enroll Now
  • Course Name: Certificate Associate in Data Science – Non-linear Supervised Learning Algorithms
  • Location: Singapore
  • Duration: 2 days classroom + 8 hours online
  • Exam Time: 60 minutes
  • Course Price: Call for price
  • Minimum requirements: Foundational Certificate in Programming

 ITPACS LogoITPACS Data Science Certification Road Map

Course contents

# Topic Method of Delivery
Day 1
1 1. Decision Trees
  • What a decision tree is and how to represent data in a decision tree
  • Information theory concepts of information entropy and information gain
  • ID3 algorithm constructing a decision tree from the training data and its implementation in Python
  • How to classify new data items using the constructed decision tree
  • Accuracy of decision trees
  • How to deal with data inconsistencies during decision tree construction
Instructor Led
2 2. Random Trees
  • Tree bagging (or bootstrap aggregation) technique as part of random forest construction
  • Reduce the bias and variance and improve the accuracy
  • Implement an algorithm in Python that would construct a random forest
  • Decreasing the variance of a classifier to yield more accurate results
Instructor Led
3 3. K Nearest Neighbors
  • Value of k in KNN
  • Distance measures in KNN
  • Euclidean distance
  • Hamming distance
  • Minkowski distance
  • Case-based reasoning (CBR)
  • Implementing KNN
Instructor Led
4

Case study

Hands-on session
Day 2
5 5. Support Vector Machines
  • Hyperplane
  • Separating hyperplane
  • Optimal hyperplane
  • Linear Algebra
  • Handling outliers
  • Dealing with more than two classes
  • Kernels of SVM
  • Linear and RBF kernel
Instructor Led
6

Case Study

Hands–on session
7

Case Project

Hands–on session
8

Assignment

Online Self paced

Certification

  • Certificate Title: Certificate Associate in Data Science – Non-linear Supervised Learning Algorithms
  • Certificate Awarding Body: ITPACS

About ITPACS

Information Technology Professional Accreditations and Certifications Society (ITPACS) is a non-profit organization focused on improving technology skills for the future. ITPACS offers associate level, professional level and leader certifications across 6 domains including data science, web development, mobile development, cyber security, IoT and blockchain. Applicants have to go through a exam eligibility process demonstrating their experience.

Certification Roadmap

CADS Machine Non Linear Algorithms Outline

Eligibility

The Associate certification is catered to individuals with less than 1 year working experience in the field. This is ideal for newcomers starting out in the profession or those seeking to make an entry into the profession. Applicants are required to have completed the application process prior to taking the exam.

Styling Eligibility

Exam

  • Exam Format: Closed-book format.
    Questions: 30 multiple choice questions, coding exercises
    Passing Score: 65%
    Exam Duration: 60 minutes
    Proctored
  • Exam needs to be taken within 12 months from the exam voucher issue date

ITPACS Certification Training Road Map

Data Science

Data science is not a single science as much as it is a collection of various scientific disciplines integrated for the purpose of analyzing data. These disciplines include various statistical and mathematical techniques, including:

  • Computer science
  • Data engineering
  • Visualization
  • Domain-specific knowledge and approaches

With the advent of cheaper storage technology, more and more data has been collected and stored permitting previously unfeasible processing and analysis of data. With this analysis came the need for various techniques to make sense of the data. These large sets of data, when used to analyze data and identify trends and patterns, become known as big data.

The process of analyzing big data is not simple and evolves to the specialization of developers who were known as data scientists. Drawing upon a myriad of technologies and expertise, they are able to analyze data to solve problems that previously were either not envisioned or were too difficult to solve.

The various data science techniques that we will illustrate have been used to solve a variety of problems. Many of these techniques are motivated to achieve some economic gain, but they have also been used to solve many pressing social and environmental problems. Problem domains where these techniques have been used include finance, optimizing business processes, understanding customer needs, performing DNA analysis, foiling terrorist plots, and finding relationships between transactions to detect fraud, among many other data-intensive problems.

Data mining is a popular application area for data science. In this activity, large quantities of data are processed and analyzed to glean information about the dataset, to provide meaningful insights, and to develop meaningful conclusions and predictions. It has been used to analyze customer behavior, detecting relationships between what may appear to be unrelated events, and to make predictions about future behavior.

Machine learning is an important aspect of data science. This technique allows the computer to solve various problems without needing to be explicitly programmed. It has been used in self-driving cars, speech recognition, and in web searches. In data mining, the data is extracted and processed. With machine learning, computers use the data to take some sort of action.