391B Orchard Road #23-01 Ngee Ann City Tower B, Singapore 238874
+ 65 66381203

Data Science/Big Data/Machine Learning Courses in Singapore

Data Science, Big Data, Machine Learning Courses in Singapore

We offer multiple courses on Data Science. The 3-day Big Data Foundation course, 3-day Data cleaning course, 3-day Machine Learning course and 3-day Artificial Intelligence Neural Networks course. Programming pre-requisite in the foundation course is optional. For participants with no programming background in Python, you will start with the 3-day Big Data Foundation course. This course will teach you Python programming for analytics. For all other data science courses except the foundation course, Python programming experience is required.

All our data science related courses are taught by working practitioners, not academicians. The goal is to get you well versed in applying techniques to solve real world problems in the most efficient manner.



3-days Foundation Course

3-days Data Cleaning

3-days Machine Learning

3-days AI-Neural Networks

Do I have the aptitude for data science?

 

Course fees big data data science foundation course

CITREP Funding

Enhanced Funding Support for Professionals aged 40 and above and SMEs

Professionals aged 40 and above (i.e. self-sponsored individuals) and SMEs who are sponsoring their employees for training (i.e. organisation-sponsored trainees) will be entitled to CITREP enhanced funding support of up to 90% of the nett payable course and certification fees. This is applicable for Singapore Citizens and Permanent Residents (PR’s).

 

Please find FY17 CITREP+ funding support details as per following:

Organisation- sponsored Non SMEs

course + exam

Up to 70% of the nett payable course and certification fees, capped at $3000 per trainee

exam only

Up to 70% of the nett payable certification fees, capped at $500 per trainee

Singapore Citizen and Permanent Residents (PRs)
SMEs Up to 90% of the nett payable course and certification fees, capped at $3000 per trainee Up to 70% of the nett payable certification fees, capped at $500 per trainee
Self-Sponsored Professionals (Citizens and PRs) Up to 70% of the nett payable course and certification fees, capped at $3000 per trainee Up to 70% of the nett payable certification fees, capped at $500 per trainee Singapore Citizen and Permanent Residents (PRs) 
Professionals (Citizens 40 years old and above)* as of 1 Jan of the current year Up to 90% of the nett payable course and certification fees, capped at $3000 per trainee Up to 70% of the nett payable certification fees, capped at $500 per trainee
Students  (Citizens) and/or Full-Time National Service (NSF) Up to 100% of the nett payable course and certification fees, capped at $2500 per trainee Up to 100% of the nett payable certification fees, capped at $500 per trainee

Big Data Foundation

The Big Data/Data Science Foundation course in Singapore offers participants with the option of getting certified as a CCC Big Data/Data Science Foundation by the Cloud Credential Council . The foundation course is non-technical and is open to managers, professionals and decision makers.

Big Data is a process to deliver decision-making insights. The process uses people and technology to quickly analyze large amounts of data of different types (traditional table structured data and unstructured data, such as pictures, video, email, transaction data, and social media interactions) from a variety of sources to produce a stream of actionable knowledge. Organizations increasingly need to analyze information to make decisions for achieving greater efficiency, profits, and productivity.

As relational databases have grown in size to satisfy these requirements, organizations have also looked at other technologies for storing vast amounts of information. These new systems are often referred to under the umbrella term “Big Data.” Gartner has identified three key characteristics for big data: Volume, Velocity, and Variety. Traditional structured systems are efficient at dealing with high volumes and velocity of data; however, traditional systems are not the most efficient solution for handling a variety of unstructured data sources or semi structured data sources.

Big Data solutions can enable the processing of many different types of formats beyond traditional transactional systems. Definitions for Volume, Velocity, and Variety vary, but most big data definitions are concerned with amounts of information that are too difficult for traditional systems to handle—either the volume is too much, the velocity is too fast, or the variety is too complex.

Enroll Now

Big Data Foundation

  • Course Name: Big Data Foundation
  • Location:Singapore
  • Duration:3 days classroom + 6 months online
  • Exam Time:60 minutes
  • Refreshments: Snacks
  • Delivery Mode: Instructor Led
  • Course Price: S$ 3093 (Including tax and exam fees)
  • Approx fees after funding (Above 40 yrs): S$ 478
  • Approx fees after funding (Less than 40 yrs): S$1042
  • Minimum requirements:none
  • Pass Guarantee:Yes. Check for conditions
  • Pass Rate: 100% last 5 months. 98% past 2 years
  • Funding: 70% – 90% funding for course and exam fees (Singapore citizens and PRs)

iKompass Data Science Course Sample Content

 

 

 
 

Big Data Foundation

 3 days

This course leads to the Big Data Foundation certification by the Cloud Credential Council (CCC). The CCC Big Data Foundation certification is the certification awarded to individuals who have successfully passed the CCC Big Data Foundation exam.

Our CCC Big Data/ Data Science Foundation course is a good place to start in case you do not have any experience with Big Data. It provides information on the best practices in devising a Big Data solution for your organization.

 

Course features:

• 3 days classroom training

• Cloud Credential Certification

• 6 months of online learning with weekly assignments and feedback

• Post course Video tutorials with support

 

Timeline_small-01

Classroom Training Outline

 

Big Data Foundation

Course Outline

Big-Data

Day 1

1.Introduction To Big Data

  • What is Big Data?
  • Usage of Big Data in real world situations

2. Data Processing Lifecycle

  • Collection
  • Pre-processing
  • Hygiene
  • Analysis
  • Interpretation
  • Intervention
  • Visualisation
  • Sources of Data

Technical Components (Optional). The below modules will be covered end of the day. Introduction to Python

  • Jupyter
  • Interactive computing
  • Functions, arguments in Python

Introduction to Pandas

Day 2

3. Source of Data

Data collection is expensive and time consuming. In some cases you will be lucky enough to have existing datasets available to support your analysis. You may have datasets from previous analyses, access to providers, or curated datasets from your organization. In many cases, however, you will not have access to the data that you require to support your analysis, and you will have to find alternate mechanisms. Twitter data is a good example as, depending on the options selected by the twitter user, every tweet contains not just the message or content that most users are aware of. It also contains a view on the network of the person, home location, location from which the message was sent, and a number of other features that can be very useful when studying networks around a topic of interest.

  • Network Data
  • Social Context Data
  • Sendor Data
  • Systems Data
  • Machine log data
  • Structured Vs Unstructured Data

4. First Order Analysis and exploration

  • Basic Statistics
  • Analyse your dataset and determine features
  • Data validation
  • Noise and bias
  • Random errors
  • Systematic errors

5. Graph Theory

Technical Components (Optional). The below modules will be covered end of the day. Introduction to NetworkX

  • Adjacency Matrix
  • Clustering
  • Create a Graph
  • Measure centrality
  • Degree distribution

6. Second order analysis

According to the SAS institute, machine learning is a method of data analysis that automates analytical model building. Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look. There are two main classes of machine learning algorithms: (i) supervised and (ii) unsupervised learning. Exactly what does learning entail? At its most basic, learning involves specifying a model structure f that hopefully can extract regularities for the data or problem at hand as well as the appropriate objective function to optimize using a specified loss function. Learning (or fitting) the model essentially means finding optimal parameters of the model structure using provided input/target data. This is also called training the model. It is common (and best practice) to split the provided data into at least two sets – training and test data sets.

  • Machine Learning
  • Meta Data
  • Training data and test data
  • Identifying Features

Technical Components (Optional). The below modules will be covered end of the day.

  • Introduction to Scikit-learn
  • Introduction to Mlxtend

Day 3

7. Rolling out Big Data projects

Hypothetical Big Data project use case: Cybersecurity measures within a company in relation to insider threats. The company hosts thousands of applications for various business functions. The context will be User Behavior Analytics. Signals include, login meta data for each application, location data, network data, employee data, performance appraisal data, travel data, deaktop activity data. The analytics is focused on determining a risk score based for each user.

Technological component or trend:

The technology component in the insider threat context requires collection and processing of the following data:

  • User Data
  • Application logs
  • Access data
  • Business data
  • Assets, CMDB
  • User activity
  • Network data

A layered approach for data processing is ideal starting with implementation of a ETL (Extract, Transform, Load). Processing of data is done through tools.

  • Extract, Transform, Load
  • Data processing
  • Normalization
  • Correlations
  • Risk profiling
  • Data lake

The last layer is the data lake which stores all structured and unstructured data. This can be accessed through libraries such as pandas, hadoop, graph db etc.,

The data lake will enable building algorithms to determine risky behavior and send alerts. The objective is to prioritize the alerts based on a risk score. Example, a user accessing a certain application from a specific ip address with a recent low rating on his performance appraisal and has booked a long holiday will be flagged as high risk.

  • Project Management
  • Different Phases
  • Technology components
  • Privacy
  • System architecture

Technical Components (Optional). The below modules will be covered end of the day.

  • K-Anonimity
  • Data Coarsing
  • Data suppression

Final Exam

40 Questions

Pass mark: 65%

Format of the Examination

bigData_2

Machine learning training singapore

Data Cleaning

Data Cleaning

Our 3 days data cleaning course teaches you techniques to scrub or process big data with the goal of making it ready for building models. Most algorithms require data that is cleaned and normalized. Data scientists typically end of spending more than 70% of their effort in data cleaning/wrangling. Knowledge of techniques to work with with unstructured data is essential in data science.

Real-world data is frequently dirty and unstructured, and must be reworked before it is usable. Data may contain errors, have duplicate entries, exist in the wrong format, or be inconsistent. The process of addressing these types of issues is called data cleaning. Data cleaning is also referred to as data wrangling, massaging, reshaping , or munging. Data merging, where data from multiple sources is combined, is often considered to be a data cleaning activity.

We need to clean data because any analysis based on inaccurate data can produce misleading results. We want to ensure that the data we work with is quality data. Data quality involves:

 

  • Validity: Ensuring that the data possesses the correct form or structure
  • Accuracy:The values within the data are truly representative of the dataset
  • Completeness:There are no missing elements
  • Consistency: Changes to data are in sync
  • Uniformity: The same units of measurement are used

 

There are several techniques and tools used to clean data. We will examine the following approaches: Handling different types of data

  • Cleaning and manipulating text data
  • Filling in missing data
  • Validating data

 

We will be using Python libraries. These libraries often are more expressive and efficient. However, there are times when using a simple string function is more than adequate to address the problem. Showing complimentary techniques will improve the student’s skill set.

 

The basic text based tasks include:

 

  • Data transformation
  • Data imputation (handling missing data)
  • Subsetting data
  • Sorting data
  • Validating data

Learning Objectives

After completing this course, you should have the skills and be familiar with the following topic

  • Handling various kind of data importing scenarios that is importing various kind of datasets (.csv, .txt), different kind of delimiters (comma, tab, pipe), and different methods (read_csv, read_table)
  • Getting basic information, such as dimensions, column names, and statistics summary
  • Getting basic data cleaning done that is removing NAs and blank spaces, imputing values to missing data points, changing a variable type, and so on
  • Creating dummy variables in various scenarios to aid modelling
  • Generating plots like scatter plots, bar charts, histograms, box plots, and so on

Who should attend

Data Analysts, Data Engineers, Data Science Enthusiasts, Business Analysts, Project Managers

Prerequisite

Foundational certificate in Big Data/Data Science This course is meant for anyone who are comfortable developing applications in Python, and now want to enter the world of data science or wish to build intelligent applications. Aspiring data scientists with some understanding of the Python programming language will also find this course to be very helpful. If you are willing to build efficient data science applications and bring them in the enterprise environment without changing your existing python stack, this course is for you

Delivery Method

Mix of Instructor-led, case study driven and hands-on for select phases

H/w, S/w Reqd

Python, Pandas, Numpy, , Spark, Elasticsearch, MongoDbSystem with at least 8GB RAM and a Windows /Ubuntu/Mac OS X operating system

Tools covered

  • Pandas
  • Numpy
  • MongoDb
  • Apache Spark
  • Elasticsearch
  • Kafka
  • Jupyter notebook
  • Ipython
  • EC2
  • S3

Enroll Now

Data Science – Data Cleaning

  • Course Name: Data Cleaning
  • Location:Singapore
  • Duration:3 days
  • Refreshments: Lunch and Snacks
  • Delivery Mode: Instructor Led
  • Price: S$ 2264 (Including course, exam and GST)
  • Minimum Requirements:Big Data Foundation
  • Programming language:Python
  • Funding: 70-90% funding for course
  • Approx fees after funding (Above 40 yrs): S$ 360
  • Approx fees after funding (Below 40 yrs): S$ 782
  • Certification: ITPACS Certified Associate in Data Science – Data Cleaning

ITPACS_Logo_image

Data Cleaning – Working with Data Lakes

DataLake

Data Cleaning – Process

Data cleaning Singapore

Data Cleaning Training Roadmap

ITPACS Certification Training Road Map

Machine Learning Course Singapore

Machine Learning

Machine Learning course in Singapore

Our 3 days machine learning course in Singapore teaches you various algorithms to build models. The course predominantly covers supervised algorithms.

Machine Learning is a name that is gaining popularity as an umbrella for methods that have been studied and developed for many decades in different scientific communities and under different names, such as Statistical Learning, Statistical Signal Processing, Pattern Recognition, Adaptive Signal Processing, Image Processing and Analysis, System Identification and Control, Data Mining and Information Retrieval, Computer Vision, and Computational Learning. The name “Machine Learning” indicates what all these disciplines have in common, that is, to learn from data, and then make predictions. What one tries to learn from data is their underlying structure and regularities, via the development of a model, which can then be used to provide predictions.

The goal of this course is to approach the machine learning discipline in a unifying context, by presenting the major paths and approaches that have been followed over the years, without giving preference to a specific one.

This course is an introduction to the world of machine learning, a topic that is becoming more and more important, not only for IT professionals and analysts but also for all those scientists and engineers who want to exploit the enormous power of techniques such as predictive analysis, classification, clustering and natural language processing.

Learning Objectives

After completing this course, you should have the skills and be familiar with the following topics

  • Apply mathematical concepts regarding the most common machine learning problems, including the concept of learnability and some elements of information theory.
  • Explain the process of Machine Learning
  • Describe the most important techniques used to preprocess a dataset, select the most informative features, and reduce the original dimensionality.
  • Describe the structure of a continuous linear model, focusing on the linear regression algorithm. Explain Ridge, Lasso, and ElasticNet optimizations, and other advanced techniques.
  • Describe the concept of linear classification, focusing on logistic regression and stochastic gradient descent algorithms.
  • Describe the concept of classification algorithms including Decision Trees, Support Vector Machines, Random Forests, Naive Bayes and K Nearest Neighbors
  • Demonstrate knowledge of evaluation metrics

Who should attend

Data Analysts, Data Engineers, Data Science Enthusiasts, Business Analysts, Project Managers

Prerequisite

Foundational certificate in Big Data/Data Science This course is meant for anyone who are comfortable developing applications in Python, and now want to enter the world of data science or wish to build intelligent applications. Aspiring data scientists with some understanding of the Python programming language will also find this course to be very helpful. If you are willing to build efficient data science applications and bring them in the enterprise environment without changing your existing python stack, this course is for you

Delivery Method

Mix of Instructor-led, case study driven and hands-on for select phases

H/w, S/w Reqd

Python, Pandas, Numpy, System with at least 8GB RAM and a Windows /Ubuntu/Mac OS X operating system

Duration

3 days

Enroll Now

Data Science – Data Cleaning

  • Course Name: Machine Learning
  • Location:Singapore
  • Duration:3 days
  • Refreshments: Lunch and Snacks
  • Delivery Mode: Instructor Led
  • Price: S$ 2264 (Including course, exam and GST)
  • Minimum Requirements:Big Data Foundation
  • Programming language:Python
  • Funding: 70-90% funding for course
  • Approx fees after funding (Above 40 yrs): S$ 360
  • Approx fees after funding (Below 40 yrs): S$ 782
  • Certification: ITPACS Certified Associate in Data Science – Machine Learning

ITPACS_Logo_image

Sample concepts covered as part of the Machine Learning course in Singapore

The course will cover in detail both the mathematical aspects as well as the business application aspect of algorithms

Training data and test data

The observations in the training set comprise the experience that the algorithm uses to learn. In supervised learning problems, each observation consists of an observed response variable and one or more observed explanatory variables. The test set is a similar collection of observations that is used to evaluate the performance of the model using some performance metric. It is important that no observations from the training set are included in the test set.

training_test_accuracy

Memorizing the training set is called over-fitting. A program that memorizes its observations may not perform its task well, as it could memorize relations and structures that are noise or coincidence. Balancing memorization and generalization, or over-fitting and under-fitting, is a problem common to many machine learning algorithms. In this course we will discuss regularization, which can be applied to many models to reduce over-fitting.

Random Forests – Ensemble Voting

Ensembling by voting can be used efficiently for classification problems. We now have a set of classifiers, and we need to use them to predict the class of an unknown case. The combining of the predictions of the classifiers can proceed in multiple ways. The two options that we will consider are majority voting, and weighted voting. Ideas related to voting will be illustrated through an ensemble based on the homogeneous base learners of decision trees, as used in the development of bagging and random forests.

random forest - machine learning singapore

Bias Variance Trade-off

Many metrics can be used to measure whether or not a program is learning to perform its task more effectively. For supervised learning problems, many performance metrics measure the amount of prediction error. There are two fundamental causes of prediction error: a model’s bias, and its variance. Assume that you have many training sets that are all unique, but equally representative of the population.

A model with high bias will produce similar errors for an input regardless of the training set it used to learn; the model biases its own assumptions about the real relationship over the relationship demonstrated in the training data. A model with high variance, conversely, will produce different errors for an input depending on the training set that it used to learn. A model with high bias is inflexible, but a model with high variance may be so flexible that it models the noise in the training set. That is, a model with high variance over-fits the training data, while a model with high bias under-fits the training data. It can be helpful to visualize bias and variance as darts thrown at a dartboard.

bias variance tradeoff - machine learning singapore

Decision Trees

Decision trees are one of the simplest techniques for classification. They can be compared with a game of 20 questions, where each node in the tree is either a leaf node or a question node. Decision tree learning is a predictive machine learning technique that uses decision trees. Decision trees make use of decision analysis and predicts the value of the target. Decision trees are simple implementations of classification problems and popular in operations research. Decisions are made by the output value predicted by the conditional variable.

Decision_tree Machine learning SIngapore

Entropy

In statistics, entropy is the measure of the unpredictability of the information contained within a distribution. The entropy technique takes cues from information theory. The premise is that more homogeneous or pure nodes require less information to be represented.

Entropy_Graphic Machine learning Singapore

Support Vector Machines

Support vector machines (SVMs) are supervised learning methods that analyze data and recognize patterns. SVMs are primarily used for classification, regression analysis, and novelty detection. Given a set of training data in a two-class learning task, an SVM training algorithm constructs a model or classification function that assigns new observations to one of the two classes on either side of a hyperplane, making it a nonprobabilistic binary linear classifier

support vector machine

Hyperplane

A support vector machine (SVM) is a supervised machine learning model that works by identifying a hyperplane between represented data. The data can be represented in a multidimensional space. Thus, SVMs are widely used in classification models. In an SVM, the hyperplane that best separates the different classes will be used.

Hyper plane - Machine learning training Singapore

Need for Applied Machine Learning

Machine learning course in Singapore

Source of Data for Machine Learning

 

Where does big data come from?

There is obvious visible information, which one is conscious of and there is information that comes off you. Example, from your phone one can determine which website you visited, who you called, who your friends are, what apps you use. Data science takes it further to reveal how close you are to someone, are you  an introvert or an extrovert, when during the day are you most productive, how often do you crave for ice cream, what genre of movies you like, what aspects of social issues interest you the most etc.,

Sensors everywhere

With the possibility of adding sensors to everything, now there is deeper insight into what is going on inside your body. Spending 10 minutes with a doctor who gives you a diagnosis based on stated or observed symptom is less useful than a system that has data about everything going on inside your body. Your health diagnosis is likely to be more accurate with analysis of data collected through devices such as fitbits and implantables.

The amount of data available with wearables and other devices provides for rich insight about how you live, work with others and have fun.

Digital Breadcrumbs

Big Data and analytics is made possible due to the digital breadcrumbs we leave. Digital breadcrumbs include things like location data, browsing habits, information from health apps, credit card transactions etc.,

The data lets us create mathematical models of how people interact, what motivates us, what influences our decision making process and how we learn from each other.

Big Data versus Information

One can think of Big Data as the raw data available in sufficient volume, variety and velocity. Volumes here refer to terabytes of data. Variety refers to the different dimensions of data. Velocity refers to the rate of change.

A bank can use credit card information to develop models that’s more predictive about future credit behavior. This provides better financial access. What you purchased, frequency of purchase, how often do you pay back, where do you spend money are better predictors of payment credibility than a simple one dimensional credit score.

Machine Learning Process

Machine Learning Singapore

Frequently Asked Questions

Foundation Course:
Data Science is a combination of business, technical and statistical worlds. We will be covering the theoretical aspects of all three in class. As such, we don’t require participants to have a background in all three. Background in any one of the three will be sufficient. Those with a programming or statistical background can explore the practical technical aspects with the instructors from 4 – 7 pm.

http://www.ikompass.edu.sg/trainings/data_science_ccc-big-data-foundation-2/


Data Cleaning:
Python programming (functional programming) knowledge required. If participant has completed the foundation course, it would suffice to do the data cleaning course.


Machine Learning:
Python programming (functional programming) knowledge required. If participant has completed the foundation course, it would suffice to do the Machine learning course.

No. The optional technical modules don’t have additional costs. However, to work through the optional technical modules, you need to have a background in either statistics or programming.

For CITREP+ funding, you must be a Singapore citizen or Permanent Residents (PR’s) and pass an exam at the end of the course. Exam will be on the last day of the class. CITREP+ funding is based on a claim that you will make after passing the exam. This means you will pay us the full course fees and IMDA will reimburse 70% or 90% of the course and exam fees after you make a claim. We will assist you with the claim process.

Foundation:
CCC Big Data Foundation
Data Cleaning:
ITPACS Certified Associate in Data Science – Data Cleaning
Machine Learning:
ITPACS Certified Associate in Data Science – Machine Learning

You can take the exam 2 times with no additional costs. Beyond the second attempt, you will need to pay for the exam fees.

Yes, the funding applies to all Singapore citizens and Permanent Residents (PR’s) irrespective of the industry.

The course does not have an academic minimum requirement. However, you need to be familiar with basic data analysis and have an understanding of school/ college statistics. You should already have knowledge of mean, standard deviation, median, variance. You should be able to make inferences from charts and graphs. Before joining the class, we will send you some data and you need to send us some insights about the data. Your insights about the data will determine if you will be able to get the most value from attending the class. Below is the link to the data analysis you need to perform before attending the class.

http://www.ikompass.edu.sg/trainings/data_science_ccc-big-data-foundation-2/data-science-form/

The difficulty level of the concepts depends on your background. If your job involves analyzing trends from data, you are likely to find the course easy. Before joining the class, we will send you some data and you need to send us some insights about the data. Your insights about the data will determine if you will be able to get the most value from attending the class.

Technology is one part of the data science world. The course covers business, statistical and technology. For example, the business side of the course covers figuring out the factors that influence sales. The statistical aspects involves uncovering the correlation between various factors that affect sales. The technology aspect involves writing code to elicit predictions. We spend about 2 hours at the end of the day in writing code in Python for those interested in the programming aspects.

Foundation:
No. This is a 3 day introductory course. Data science is an extensive field and can take years to be an expert. Many data scientists specialize in one particular domain. This course provides you with an overview of what is involved in data science.

Foundation:
The course covers the theoretical aspects of a Big Data Solution. The technical aspects of building a big data solution is not covered because there are so many different architectures and technologies.
Data Cleaning:
Yes, we will cover Spark, EC2, Kafka and MongoDB
Machine Learning:
Yes, we will cover Spark, EC2, Kafka and MongoDB

Most of the participants are managers in companies across different industries who are evaluating opportunities for using analytics to make decisions. These managers are either exploring the application of data science within their own domain or are already working with data scientists and analysts. Upon completion of the course, these managers are in a better position to drive data science projects in their context. Most of these managers represent the business side of data science.

Gartner said there would be a shortage of 100,000 data scientists (US) by 2020. McKinsey put the national gap (US) in data scientists and others with deep analytical expertise at 140,000 to 190,000 people by 2017, resulting in demand that’s 60 percent greater than supply.

Accenture found that more than 90 percent of its clients planned to hire people with data science expertise, but more than 40 percent cited a lack of talent as the number one problem.