#14-01, Raffles City Tower, 250 North Bridge Road, Singapore 179101
+65 66381203  

Data Science/ Machine Learning/ Big Data Courses Singapore

Home»Trainings»Data Science/ Machine Learning/ Big Data Courses Singapore

Data Science/Big Data/Machine Learning Courses in Singapore

Data Science, Big Data, Machine Learning Courses in Singapore

We offer multiple courses on Data Science. Programming pre-requisite in the foundation course is optional. For participants with no programming background in Python, you will start with the 3-day Big Data/Data Science/ML Foundation course. This course will teach you Python programming for analytics. For all other data science courses except the foundation course, Python programming experience is required.

All our data science related courses are taught by working practitioners, not academicians. The goal is to get you well versed in applying techniques to solve real world problems in the most efficient manner.



1.Big Data/Data Science/ML Foundation

2.Data Acquisition Course

3.Data Processing and Cleaning Course

4.Data Visualization Course

5.Data Analytics Course

6.Supervised Machine Learning

7.Un-Supervised Machine Learning

8.Deep Learning Course – Artificial Neural Network

Download AI and Machine learning use cases

CITREP Funding

Enhanced Funding Support for Professionals aged 40 and above and SMEs

Professionals aged 40 and above (i.e. self-sponsored individuals) and SMEs who are sponsoring their employees for training (i.e. organisation-sponsored trainees) will be entitled to CITREP enhanced funding support of up to 90% of the nett payable course and certification fees. This is applicable for Singapore Citizens and Permanent Residents (PR’s).

 

Please find FY17 CITREP+ funding support details as per following:

Organisation- sponsored Non SMEs

course + exam

Up to 70% of the nett payable course and certification fees, capped at $3000 per trainee

exam only

Up to 70% of the nett payable certification fees, capped at $500 per trainee

Singapore Citizen and Permanent Residents (PRs)
SMEs Up to 90% of the nett payable course and certification fees, capped at $3000 per trainee Up to 70% of the nett payable certification fees, capped at $500 per trainee
Self-Sponsored Professionals (Citizens and PRs) Up to 70% of the nett payable course and certification fees, capped at $3000 per trainee Up to 70% of the nett payable certification fees, capped at $500 per trainee Singapore Citizen and Permanent Residents (PRs) 
Professionals (Citizens 40 years old and above)* as of 1 Jan of the current year Up to 90% of the nett payable course and certification fees, capped at $3000 per trainee Up to 70% of the nett payable certification fees, capped at $500 per trainee
Students  (Citizens) and/or Full-Time National Service (NSF) Up to 100% of the nett payable course and certification fees, capped at $2500 per trainee Up to 100% of the nett payable certification fees, capped at $500 per trainee

Big Data/Data Science/ML Foundation Course

The Big Data/Data Science/ML Foundation course in Singapore teaches you the basic skills and expertise needed to dissect large volumes of data leading, detect patterns and enable intelligent decision-making. The foundation course is non-technical and is open to managers, professionals and decision makers.

On day 1, we cover the basics of Big data, on day 2, we cover data science aspects and on day 3, we cover machine learning basics. 

Participants will get practical knowledge of Data Acquisition, Data Cleaning, Data Analysis, Data Visualization and Machine Learning.  The course covers business, computer science and math and provides insights into successful data science projects. 

This course has been designed from ground up to cater to people with no prior coding experience. Participants will be introduced into the world of Python programming through easy to grasp exercises. Our instructors will work with individuals one to one throughout the class to ensure each participant grasps the fundamentals. 

After completion of the 3 days classroom training, participants can take up our online tutorials that covers advanced topics. No additional charges for the online tutorials . The online tutorials includes watching a video and completing an exercise. Instructors will then provide feedback on the completed exercises. Instructor support is available for 6 months after classroom training.

Big Data is a process to deliver decision-making insights. The process uses people and technology to quickly analyze large amounts of data of different types (traditional table structured data and unstructured data, such as pictures, video, email, transaction data, and social media interactions) from a variety of sources to produce a stream of actionable knowledge. Organizations increasingly need to analyze information to make decisions for achieving greater efficiency, profits, and productivity.

As relational databases have grown in size to satisfy these requirements, organizations have also looked at other technologies for storing vast amounts of information. These new systems are often referred to under the umbrella term “Big Data.” Gartner has identified three key characteristics for big data: Volume, Velocity, and Variety. Traditional structured systems are efficient at dealing with high volumes and velocity of data; however, traditional systems are not the most efficient solution for handling a variety of unstructured data sources or semi structured data sources.

Big Data solutions can enable the processing of many different types of formats beyond traditional transactional systems. Definitions for Volume, Velocity, and Variety vary, but most big data definitions are concerned with amounts of information that are too difficult for traditional systems to handle—either the volume is too much, the velocity is too fast, or the variety is too complex.

Enroll Now

Big Data/Data Science/ML Foundation

  • Course Name: Big Data/Data Science/ML Foundation
  • Location:Singapore
  • Duration:3 days classroom + 6 months online
  • Refreshments: Lunch and Snacks
  • Delivery Mode: Instructor Led
  • Prior Coding experience: Not required
  • Minimum requirements:none
  • Pass Guarantee:Yes. Check for conditions
  • Pass Rate: 100% last 5 months. 98% past 2 years

iKompass Big Data/Data Science/ML Course Sample Content

 

 

 
 

Big Data/Data Science/ML Foundation

 3 days Classroom Training

Our Big Data/ Data Science/ML Foundation course is a good place to start in case you do not have any experience with Big Data or data science. It provides information on the best practices in devising a Big Data/data science/ML solution for your organization. The course teaches you the basic skills and expertise needed to dissect large volumes of data leading to intelligent decision-making.

 

Course features:

• 3 days classroom training

• Business and manager focused

• 6 months of online learning with weekly assignments and feedback

• Post course Video tutorials with support

 

Timeline_small-01

Classroom Training Outline

 

Big-Data

Big Data/Data Science/ML Foundation course outline

DAY 1 TIME TOPIC DELIVERY DESCRIPTION TOOLS
9:30 – 10:00 Machine Learning Lifecycle Theory Training and testing data. The machine learning life cycle is the cyclical process that data science projects follow. It defines each step that an organization needs to take in order to take advantage of machine learning and artificial intelligence (AI) to derive practical business value.  Case studies
10:00 – 10:30 BI Versus Data Science Theory Business intelligence is the use of data to help make business decisions. Data analytics is a data science. If business intelligence is the decision making phase, then data analytics is the process of asking questions.  Discussion
10:30 – 10:45 Tea break
10:45 – 12:00 Big Data Characteristics Theory Volume, Velocity, Value. This section discusses characteristics that make for Big Data.   Discussion
12:00 – 13:00 Lunch
13:00 – 14:00 Python Functional Programming Practical Lists, Dictionary, Strings, Tuples, Functions. Python is a general – purpose programming language that is becoming more and more popular for doing data science. Companies worldwide are using Python to harvest insights from their data and get a competitive edge. Python
14:00 – 14:45 Data Science tools Practical With over 6 million users, the open source Anaconda Distribution is the fastest and easiest way to do Python and R data science and machine learning on Linux, Windows, and Mac OS X. It’s the industry standard for developing, testing, and training on a single machine.

Anaconda

NumPy

14:45 -15:00 Tea break
15:00 – 17:30 Python Data Structures Practical The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Jupyter Notebook

 

DAY 2 TIME TOPIC DELIVERY DISCRIPTION TOOLS
9:30 – 10:00 Big Data Engineering Theory Clusters Hadoop
10:00 – 10:45 Distributed Databases Theory NoSQL. NoSQL encompasses a wide variety of different database technologies that were developed in response to the demands presented in building modern applications MongoDB
10:30 – 10:45 Tea break
10:45 – 11:15 Distributed Processing Theory Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. Spark
11:15 – 12:00 Data Lakes Theory A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. Hadoop or S3
12:00 – 13:00 Lunch
13:00 – 14:00 NumPy Practical Mathematical Operations on matrices  
14:00 – 14:45 Data Acquisition Practical Data Collection. API stands for Application Programming Interface. An API is a software intermediary that allows two applications to talk to each other. In other words, an API is the messenger that delivers your request to the provider that you’re requesting it from and then delivers the response back to you. Beautiful Soup. API’s
14:00 – 14:45 Data Cleaning Practical Wrangling. Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Pandas
14:45 -15:00 Tea break
15:00 – 17:30 Data Visualization Practical Charts. Data visualization is a general term that describes any effort to help people understand the significance of data by placing it in a visual context. Patterns, trends and correlations that might go undetected in text-based data can be exposed and recognized easier with data visualization software. Seaborn, Bokeh

 

DAY 3 TIME TOPIC DELIVERY DISCRIPTION TOOLS
9:30 – 10:45 Machine Learning Algorithms Practical Supervised. Scikit-learn (formerly scikits.learn) is a free software machine learning library for the Python programming language.It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means. Scikit Learn
10:30 – 10:45 Tea break
10:45 – 11:00 Linear Regression Practical In statistics, linear regression is a linear approach to modelling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). Regressor
11:00 – 11:30 K Nearest Neighbors Practical In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression Classifier
11:30 – 12:00 Naïve Bayes Practical In machine learning, naive Bayes classifiers are a family of simple “probabilistic classifiers” based on applying Bayes’ theorem with strong (naive) independence assumptions between the features. Classifer
12:00 – 13:00 Lunch
13:00 – 18:30 Data Science project Practical In this project, we are going to use a very simple Titanic passenger survival dataset to show you how to start and finish a simple data science project using Python and Pandas; from exploratory data analysis, to feature selection and feature engineering, to model building and evaluation. Pandas

Data Acquisition

Data Acquisition Course

Data acquisition is the processes for bringing data that has been created by a source outside the organization, into the organization, for production use.

Course Description:

With the advent of data science and predictive analytics, many organizations have come to the realization that enterprise data must be fused with external data to enable and scale a digital business transformation.

This means that processes for identifying, sourcing, understanding, assessing and ingesting such data must be developed.

Organizations embarking on a data journey to leverage the business value of data across the information supply chain will need to navigate the unique challenges of self-service analytics.

Covers advanced topics in:

Data ingestion from data lakes and NoSQL sources

  • Data Lake and Data Warehouse
  • SQL and NoSQL Databases
  • CRUD in Amazon S3
  • Web scrapping
  • API using JSON
  • Common file formats such as CSV, Excel, HTML
  • ETL Extract, Transform, Load

Sample tasks that constitute a data acquisation process:

There are several things that stand out about this list. The first is that it consists of a relatively large number of tasks. The second is that it may easily be inferred that many different groups are going to be involved, e.g., Analytics or Data Science will likely come up with the need and use case.

  • A need for data is identified, perhaps with use cases.
  • Prospecting for the required data is carried out.
  • Data sources are disqualified, leaving a set of qualified sources.
  • Semantic analysis of the data sets is undertaken, so they are adequately understood.
  • The data sets are evaluated against originally established use cases.
  • Implementation specifications are drawn up, usually involving Data Operations who will be responsible for production processes.
  • Source onboarding occurs, such that ingestion is technically accomplished.
  • Production ingest is undertaken.

Enroll Now

Data Acquisition

  • Course Type: Full Time 9:30 am to 5:30 pm
  • Location:Singapore
  • Classroom Duration: 3 days classroom
  • E – Learning: 1 year
  • Prerequisite: Python experience needed or should have attended the Big Data/Data Science/ML Foundation Course
  • Certifications (Optional): Certified Associate in Data Science – Data Acquisition
  • Logistics: Bring your own laptop or use ours. Mac or Windows, Min 4 GB RAM. Lunch, snacks, parking included

Data Processing and Data Cleaning

Data Processing and Cleaning Course

Data processing refers to the process of cleaning, restructuring and enriching the raw data available into a more usable format.

Course Description:

Organizing and cleaning data before analysis has been shown to be extremely useful and helps the firms quickly analyse larger amounts of data.

If implemented well, data cleaning could definitely turn out to be one of the most critical practices at your disposal

Data wrangling, like most data analytics processes, is an iterative one – the practitioner will need to carry out these steps repeatedly in order to produce the results he desires. There are six broad steps to data wrangling.

Covers advanced topics in:

Data cleaning and wrangling

  • Discovering
  • Structuring
  • Cleaning
  • Enriching
  • Validating
  • Publishing

Sample tasks that constitute a data wrangling process:

With the wide variety of verticals, use-cases, types of users, and systems utilizing enterprise data today, the specifics of cleaning can take on myriad forms

  • Data needs to be restructured in a manner that better suits the analytical method used.
  • Outliers need to be cleaned.
  • Null values will have to be changed, and the formatting will need to be standardized.
  • Augment data using some additional data in order to make it better.
  • Ascertain whether the fields in the data set are accurate via a check across the data.
  • Document the steps which were taken or logic used to wrangle the said data.
  • Transform to new formats appropriate for downstream processing.
  • Reshaping and aggregating time series data to the dimensions and spans of interest.

Enroll Now

Data Processing and Cleaning Course

  • Course Type: Full Time 9:30 am to 5:30 pm
  • Location: Singapore
  • Classroom Duration: 3 days classroom
  • E – Learning: 1 year
  • Prerequisite: Python experience needed or should have attended the Big Data/Data Science/ML Foundation Course
  • Certifications (Optional): Certified Associate in Data Science – Data Cleaning
  • Logistics: Bring your own laptop or use ours. Mac or Windows, Min 4 GB RAM. Lunch, snacks, parking included

 

Data Visualization Course

Data Visualization Course

More than just making fancy charts, visualisation is a way of communicating a dataset’s information in a way that’s easy for people to understand. With a good visualisation, one can most clearly see the patterns and information that often lie hidden within the data.

Course Description:

In the early stages of a project, you’ll often be doing an Exploratory Data Analysis (EDA) to gain some insights into your data. Creating visualisations will help you along the way to speed up your analysis.

Towards the end of your project, it’s important to be able to present your final results in a clear, concise, and compelling manner that your audience, who are often non-technical stakeholders, can understand.

There’s no doubt: taking your visualisations to the next level will turbocharge your analysis — and help you knock your next presentation out of the park.

Covers advanced topics in:

Data Visualization

  • Comprehend information quickly
  • Identify relationships and patterns
  • Pinpoint emerging trends
  • Communicate the story to others
  • Deciding which visual is best
  • Dashboards with interactive charts

Sample tasks that constitute a data visualization process:

Data visualization is a quick, easy way to convey concepts in a universal manner – and you can experiment with different scenarios by making slight adjustments.

  • Understand the data you’re trying to visualize, including its size and cardinality (the uniqueness of data values in a column).
  • Determine what you’re trying to visualize and what kind of information you want to communicate.
  • Know your audience and understand how it processes visual information.
  • Use a visual that conveys the information in the best and simplest form for your audience.
  • Prepare for the amount of data you will be working with
  • Determine cardinality of columns you’re trying to visualize.
  • Identify areas that need attention or improvement.
  • Clarify which factors influence customer behavior.

Enroll Now

Data Visualization Course

  • Course Type: Full Time 9:30 am to 5:30 pm
  • Location: Singapore
  • Classroom Duration: 3 days classroom
  • E – Learning: 1 year
  • Prerequisite: Python experience needed or should have attended the Big Data/Data Science/ML Foundation Course
  • Certifications (Optional): Certified Associate in Data Science – Data Visualization
  • Logistics: Bring your own laptop or use ours. Mac or Windows, Min 4 GB RAM. Lunch, snacks, parking included

 

Data Analytics Course

Data Analytics Course

The purpose of this course is to teach a practical approach for making sense out of data. A step-by-step process is introduced, which is designed to walk you through the steps and issues that you will face in data analysis or data mining projects.

Course Description:

You will learn how to identify non-trivial facts, patterns, and relationships in the data.

You will learn how to create models from the data to better understand the data and make predictions.

The choice of methods used to analyze the data depends on many factors, including the problem definition and the type of the data that has been collected.

Covers advanced topics in:

Data Analytics

  • Describing data, exploratory data analysis
  • Working with data tables, grouping, pivots etc.,
  • Understanding relationships
  • Clustering and association
  • Learning decision trees from data
  • Building models from data, regression, classification

Sample tasks that constitute a data analysis process:

Although many methods might solve your problem, you may not know which one works best until you have experimented with the alternatives.

  • Identification of important facts, relationships, anomalies, or trends in the data.
  • Development of mathematical models that encode relationships in the data.
  • Determine if the findings or generated models are being used to achieve the business objectives.
  • Assess the quality and usefulness of the models using data not used to create the model.
  • Apply descriptive statistical approaches and inferential statistical methods.
  • Use hypothesis testing and confidence intervals.
  • Learn about probabilities and standard errors.

Enroll Now

Data Analytics Course

  • Course Type: Full Time 9:30 am to 5:30 pm
  • Location: Singapore
  • Classroom Duration: 3 days classroom
  • E – Learning: 1 year
  • Prerequisite: Python experience needed or should have attended the Big Data/Data Science/ML Foundation Course
  • Certifications (Optional): Certified Associate in Data Science – Data Analysis
  • Logistics: Bring your own laptop or use ours. Mac or Windows, Min 4 GB RAM. Lunch, snacks, parking included

 

Supervised Machine Learning Algorithms Course

Supervised Machine Learning Algorithms Course

In a supervised scenario, the task of the model is to find the correct label of a sample, assuming that the presence of a training set is correctly labeled, along with the possibility of comparing the estimated value with the correct one.

Course Description:

The term supervised is derived from the idea of an external teaching agent that provides precise and immediate feedback after each prediction.

The model can use such feedback as a measure of the error and, consequently perform the corrections needed to reduce it.

All samples must be independent and identically distributed (IID) values uniformly sampled from the data generating process.

Covers advanced topics in:

Supervised Machine Learning Algorithms

  • Generalization, Overfitting, and Underfitting
  • k-Nearest Neighbors
  • Linear Models
  • Naive Bayes Classifiers
  • Decision Trees
  • Support Vector Machines

Sample tasks that constitute Supervised Machine Learning Algorithms:

There are two major types of supervised machine learning problems, called classification and regression.

  • Classification problems.
  • Regression problems.
  • Machine Learning Architecture.
  • Loss functions, Mean squared error, Mean absolute error, Precision, recall, and accuracy.
  • Time Series Analysis and Ensemble Modeling
  • Ensemble learning methods, Bagging, Boosting, Stacking.
  • SVM, transformation functions

Enroll Now

Supervised Machine Leraning Course

  • Course Type: Full Time 9:30 am to 5:30 pm
  • Location: Singapore
  • Classroom Duration: 3 days classroom
  • E – Learning: 1 year
  • Prerequisite: Python experience needed or should have attended the Big Data/Data Science/ML Foundation Course
  • Certifications (Optional): Certified Associate in Data Science – Non Linear Supervised Learning Algorithms
  • Logistics: Bring your own laptop or use ours. Mac or Windows, Min 4 GB RAM. Lunch, snacks, parking included

 

Un - Supervised Machine Learning Algorithms Course

Un – Supervised Machine Learning Algorithms Course

This course guides you through the best practices for using unsupervised learning techniques in tandem with Python libraries to extract meaningful information from unstructured data.

Course Description:

Unsupervised learning encompasses the problem set of having a tremendous amount of data that is unlabeled.

Unsupervised learning is about making use of raw, untagged data and applying learning algorithms to it to help a machine predict its outcome.

You will be introduced to the best-used libraries and frameworks from the Python ecosystem and address unsupervised learning.

Covers advanced topics in:

Unsupervised Machine Learning Algorithms

  • Clustering, K-Means
  • Hierarchical Clustering
  • Neighborhood Approaches and DBSCAN
  • Dimension Reduction and PCA
  • Autoencoders
  • Topic Modeling

Sample tasks that constitute a UnSupervised Machine Learning Algorithms:

You will explore various algorithms, techniques that are used to implement unsupervised learning in real-world use cases.

  • Use cluster algorithms to identify and optimize natural groups of data.
  • Explore advanced non-linear and hierarchical clustering in action.
  • Soft label assignments for fuzzy c-means and Gaussian mixture models.
  • Detect anomalies through density estimation.
  • Perform principal component analysis using neural network models.
  • Create unsupervised models using GANs.

Enroll Now

Un – Supervised Machine Leraning Course

  • Course Type: Full Time 9:30 am to 5:30 pm
  • Location: Singapore
  • Classroom Duration: 3 days classroom
  • E – Learning: 1 year
  • Prerequisite: Python experience needed or should have attended the Big Data/Data Science/ML Foundation Course
  • Certifications (Optional): Certified Associate in Data Science – Unsupervised Learning Algorithms
  • Logistics: Bring your own laptop or use ours. Mac or Windows, Min 4 GB RAM. Lunch, snacks, parking included

Deep Learning - Artificial Neural Networks

Deep Learning Courses – Artificial Neural Networks

Deep learning is a specific subfield of machine learning: a new take on learning representations from data that puts an emphasis on learning successive layers of increasingly meaningful representations.

Course Description:

In deep learning, layered representations are (almost always) learned via models called neural networks, structured in literal layers stacked on top of each other.

Deep neural networks do input-to-target mapping via a deep sequence of simple data transformations (layers) and these data transformations are learned by exposure to examples.

The specification of what a layer does to its input data is stored in the layer’s weights, which in essence are a bunch of numbers.

Covers advanced topics in:

Deep Learning Course

  • Data representations for neural networks
  • Tensor operations
  • Gradient-based optimization
  • Loss functions and optimizers
  • Recurrent neural networks
  • Convolutional neural networks

Sample tasks that constitute Deep Learning:

You will get started with using neural networks to solve real problems.

  • Learn to use tensorflow and Keras.
  • Deep learning for computer vision.
  • Deep learning for text and sequences.
  • Generative deep learning.
  • Text generation with LSTM.
  • Generative adversarial networks (GANs)

Enroll Now

Deep Learning Course – Artificial Neural Networks

  • Course Type: Full Time 9:30 am to 5:30 pm
  • Location: Singapore
  • Classroom Duration: 3 days classroom
  • E – Learning: 1 year
  • Prerequisite: Python experience needed or should have attended the Big Data/Data Science/ML Foundation Course
  • Certifications (Optional): Certified Associate in Data Science – Deep Learning
  • Logistics: Bring your own laptop or use ours. Mac or Windows, Min 4 GB RAM. Lunch, snacks, parking included

 

Sample concepts covered as part of the Machine Learning course in Singapore

The course will cover in detail both the mathematical aspects as well as the business application aspect of algorithms

Training data and test data

The observations in the training set comprise the experience that the algorithm uses to learn. In supervised learning problems, each observation consists of an observed response variable and one or more observed explanatory variables. The test set is a similar collection of observations that is used to evaluate the performance of the model using some performance metric. It is important that no observations from the training set are included in the test set.

training_test_accuracy

Memorizing the training set is called over-fitting. A program that memorizes its observations may not perform its task well, as it could memorize relations and structures that are noise or coincidence. Balancing memorization and generalization, or over-fitting and under-fitting, is a problem common to many machine learning algorithms. In this course we will discuss regularization, which can be applied to many models to reduce over-fitting.

Random Forests – Ensemble Voting

Ensembling by voting can be used efficiently for classification problems. We now have a set of classifiers, and we need to use them to predict the class of an unknown case. The combining of the predictions of the classifiers can proceed in multiple ways. The two options that we will consider are majority voting, and weighted voting. Ideas related to voting will be illustrated through an ensemble based on the homogeneous base learners of decision trees, as used in the development of bagging and random forests.

random forest - machine learning singapore

Bias Variance Trade-off

Many metrics can be used to measure whether or not a program is learning to perform its task more effectively. For supervised learning problems, many performance metrics measure the amount of prediction error. There are two fundamental causes of prediction error: a model’s bias, and its variance. Assume that you have many training sets that are all unique, but equally representative of the population.

A model with high bias will produce similar errors for an input regardless of the training set it used to learn; the model biases its own assumptions about the real relationship over the relationship demonstrated in the training data. A model with high variance, conversely, will produce different errors for an input depending on the training set that it used to learn. A model with high bias is inflexible, but a model with high variance may be so flexible that it models the noise in the training set. That is, a model with high variance over-fits the training data, while a model with high bias under-fits the training data. It can be helpful to visualize bias and variance as darts thrown at a dartboard.

bias variance tradeoff - machine learning singapore

Decision Trees

Decision trees are one of the simplest techniques for classification. They can be compared with a game of 20 questions, where each node in the tree is either a leaf node or a question node. Decision tree learning is a predictive machine learning technique that uses decision trees. Decision trees make use of decision analysis and predicts the value of the target. Decision trees are simple implementations of classification problems and popular in operations research. Decisions are made by the output value predicted by the conditional variable.

Decision_tree Machine learning SIngapore

Entropy

In statistics, entropy is the measure of the unpredictability of the information contained within a distribution. The entropy technique takes cues from information theory. The premise is that more homogeneous or pure nodes require less information to be represented.

Entropy_Graphic Machine learning Singapore

Support Vector Machines

Support vector machines (SVMs) are supervised learning methods that analyze data and recognize patterns. SVMs are primarily used for classification, regression analysis, and novelty detection. Given a set of training data in a two-class learning task, an SVM training algorithm constructs a model or classification function that assigns new observations to one of the two classes on either side of a hyperplane, making it a nonprobabilistic binary linear classifier

support vector machine

Hyperplane

A support vector machine (SVM) is a supervised machine learning model that works by identifying a hyperplane between represented data. The data can be represented in a multidimensional space. Thus, SVMs are widely used in classification models. In an SVM, the hyperplane that best separates the different classes will be used.

Hyper plane - Machine learning training Singapore

Need for Applied Machine Learning

Machine learning course in Singapore

Source of Data for Machine Learning

 

Where does big data come from?

There is obvious visible information, which one is conscious of and there is information that comes off you. Example, from your phone one can determine which website you visited, who you called, who your friends are, what apps you use. Data science takes it further to reveal how close you are to someone, are you  an introvert or an extrovert, when during the day are you most productive, how often do you crave for ice cream, what genre of movies you like, what aspects of social issues interest you the most etc.,

Sensors everywhere

With the possibility of adding sensors to everything, now there is deeper insight into what is going on inside your body. Spending 10 minutes with a doctor who gives you a diagnosis based on stated or observed symptom is less useful than a system that has data about everything going on inside your body. Your health diagnosis is likely to be more accurate with analysis of data collected through devices such as fitbits and implantables.

The amount of data available with wearables and other devices provides for rich insight about how you live, work with others and have fun.

Digital Breadcrumbs

Big Data and analytics is made possible due to the digital breadcrumbs we leave. Digital breadcrumbs include things like location data, browsing habits, information from health apps, credit card transactions etc.,

The data lets us create mathematical models of how people interact, what motivates us, what influences our decision making process and how we learn from each other.

Big Data versus Information

One can think of Big Data as the raw data available in sufficient volume, variety and velocity. Volumes here refer to terabytes of data. Variety refers to the different dimensions of data. Velocity refers to the rate of change.

A bank can use credit card information to develop models that’s more predictive about future credit behavior. This provides better financial access. What you purchased, frequency of purchase, how often do you pay back, where do you spend money are better predictors of payment credibility than a simple one dimensional credit score.

Machine Learning Process

Machine Learning Singapore

Frequently Asked Questions

Foundation Course:
Data Science is a combination of business, technical and statistical worlds. We will be covering the foundational aspects of all three in class. As such, we don’t require participants to have a background in all three. Background in any one of the three will be sufficient. We will teach Python functional programming in class along with how to use the data science libraries for data acquisition, visualisation, machine learning.

No. The optional technical modules don’t have additional costs. However, to work through the optional technical modules, you need to have a background in either statistics or programming.

For CITREP+ funding, you must be a Singapore citizen or Permanent Residents (PR’s). CITREP+ funding is based on a claim that you will make after passing the exam. This means you will pay us the full course fees and IMDA will reimburse 70% or 90% of the course and exam fees after you make a claim. We will assist you with the claim process.

Foundation:
ITPACS
Data Cleaning:
ITPACS Certified Associate in Data Science – Data Cleaning
Machine Learning:
ITPACS Certified Associate in Data Science – Machine Learning

Yes, the funding applies to all Singapore citizens and Permanent Residents (PR’s) irrespective of the industry.

The course does not have an academic minimum requirement. However, you need to be familiar with basic data analysis and have an understanding of school/ college statistics.

The difficulty level of the concepts depends on your background. If your job involves analyzing trends from data, you are likely to find the course easy.

Technology is one part of the data science world. The course covers business, statistical and technology. For example, the business side of the course covers figuring out the factors that influence sales. The statistical aspects involves uncovering the correlation between various factors that affect sales. The technology aspect involves writing code to elicit predictions. We spend about 2 hours at the end of the day in writing code in Python for those interested in the programming aspects.

Foundation:
No. This is a 3 day introductory course. Data science is an extensive field and can take years to be an expert. Many data scientists specialize in one particular domain. This course provides you with an overview of what is involved in data science.

Foundation:
The course covers the theoretical aspects of a Big Data Solution. The technical aspects of building a big data solution is not covered because there are so many different architectures and technologies.
Data Cleaning:
Yes, we will cover Spark, EC2, Kafka and MongoDB
Machine Learning:
Yes, we will cover Spark, EC2, Kafka and MongoDB

Most of the participants are managers in companies across different industries who are evaluating opportunities for using analytics to make decisions. These managers are either exploring the application of data science within their own domain or are already working with data scientists and analysts. Upon completion of the course, these managers are in a better position to drive data science projects in their context. Most of these managers represent the business side of data science.

Gartner said there would be a shortage of 100,000 data scientists (US) by 2020. McKinsey put the national gap (US) in data scientists and others with deep analytical expertise at 140,000 to 190,000 people by 2017, resulting in demand that’s 60 percent greater than supply.

Accenture found that more than 90 percent of its clients planned to hire people with data science expertise, but more than 40 percent cited a lack of talent as the number one problem.

Big Data/Data Science Foundation Course: We offer a pass guarantee for this exam. In case a participant fails the exam, they have two more attempts to clear the exam at no additional cost. The objective of the foundation course is to facilitate entry into the data science field for people with no analytics background. As such, the exam itself is not difficult. The exam does not have any coding. In the unlikely scenario wherein the participant fails the third time, we will refund the full course fees.

The funding process is done online. After course completion, you will upload some documents such as Invoice, receipt etc., on to IMDA’s system. The funding is a reimbursement made to you by IMDA after course completion. The reimbursement takes 2-4 weeks. This means you have to pay the full amount first and then get the reimbursement. We will support you for through the administrative process for submitting your claim.

Yes. If you are currently in-between jobs, we provide additional discount on the course fees. During registration, let us know about your situation and we will accommodate additional discount.

Recent studies in neuroscience demonstrate that we can change our brain just by thinking. Our concept of “self” is etched in the living latticework of our 100 billion brain cells and their connections. Picking up new skills is about making new connections in the mind. By the time you complete the course, you have changed your brain permanently. If you learned even one bit of information, tiny brain cells have made new connections between them, and who you are is altered. The act of mental stimulation through learning is a powerful way you can grow and mold new circuits in your brain. Growing new circuits is vital to growth and state of being.

There is a small chance that you may be in what a growing body of knowledge point to as “survival mode”. When we live in survival, we limit our growth, because the chemicals of stress will always drive our big-thinking brain to act equal to its chemical substrates. Chronic long-term stress weakens our bodies. We choose to remain in the same circumstances because we have become addicted to the emotional state they produce and the chemicals that arouse that state of being. Far too many of us remain in situations that make us unhappy, feeling as if we have no choice but to be in stress. We choose to live stuck in a particular mindset and attitude, partly because of genetics and partly because a portion of the brain (a portion that has become hardwired by our repeated thoughts and reactions) limits our vision of what’s possible.

We can change (and thus, evolve) our brain, so that we no longer fall into those repetitive, habitual, and unhealthy reactions that are produced as a result of our genetic inheritance and our past experiences. Scientists call this neuroplasticity—the ability to rewire and create new neural circuits at any age—to make substantial changes in the quality of your life.

Learning a new skill allows new experiences and in turn fires new circuits related to curiosity, creativity etc,

The brain is structured, both macroscopically and microscopically, to absorb and engage novel information, and then store it as routine. When we no longer learn new things or we stop changing old habits, we are left only with living in routine. When we stop upgrading the brain with new information, it becomes hardwired, riddled with automatic programs of behavior that no longer support a healthy state of being. If you are not learning anything new, your brain is constantly firing the same old neurons related to negative states such anxiety, stress and worry. We are marvels of flexibility, adaptability, and a neuroplasticity that allows us to reformulate and repattern our neural connections and produce the kinds of behaviors that we want.

Research is beginning to verify that the brain is not as hardwired as we once thought. We now know that any of us, at any age, can gain new knowledge, process it in the brain, and formulate new thoughts, and that this process will leave new footprints in the brain—that is, new synaptic connections develop. That’s what learning is. In addition to knowledge, the brain also records every new experience. When we experience something, our sensory pathways transmit enormous amounts of information to the brain regarding what we are seeing, smelling, tasting, hearing, and feeling. In response, neurons in the brain organize themselves into networks of connections that reflect the experience. feelings. Every new occurrence produces a feeling, and our feelings help us remember an experience. The process of forming memories is what sustains those new neural connections on a more long-term basis. Memory, then, is simply a process of maintaining new synaptic connections that we form via learning irrespective of age.

The reality is that if you are not making new neural connections, the brain cells are decaying or firing the same old routine patterns. This leads to physically aging faster than usual and other health problems.

Contrary to the myth of the hardwired brain, we now realize that the brain changes in response to every experience, every new thought, and every new thing we learn. This is called plasticity. Researchers are compiling evidence that the brain has the potential to be moldable and pliable at any age.

AI has two sides. Research and application. Research is about in depth knowledge of how something works. You could spend years in research to find out how electricity and waves works and finally create a microwave. Consumers then use these microwaves to cook various food. A consumer doesn’t need to have extensive knowledge on the inner working of a microwave. They can get creative about the end result of using the microwave. This is the application side of things. Currently, as a result of extensive research, there is plethora of microwaves in the market. Attending a university courses is like creating another microwave, reinventing the wheel. You would rather focus your effort on the application side of AI. Take the already built algorithms and use it for your use cases. The way we teach our course is to apply these algorithms to solves business problems rather than go in-depth into calculus, matrices and trigonometry that make up an algorithm.