Big Data/ Data Science courses in Singapore
We offer two main courses on Data Science. The 3-day Big Data Foundation course and the 3-weeks Data Science Bootcamp. Programming element in the foundation course is optional. The intensive bootcamp covers indepth material and has 75% mandatory hands-on programming element.
CCC Big Data/ Data Science Foundation
The CCC Big Data/Data Science Foundation course in Singapore offers participants with the option of getting certified as a CCC Big Data/Data Science Foundation by the Cloud Credential Council .
The foundation course is non-technical and is open to managers, professionals and decision makers.
Big Data is a process to deliver decision-making insights. The process uses people and technology to quickly analyze large amounts of data of different types (traditional table structured data and unstructured data, such as pictures, video, email, transaction data, and social media interactions) from a variety of sources to produce a stream of actionable knowledge.
Organizations increasingly need to analyze information to make decisions for achieving greater efficiency, profits, and productivity. As relational databases have grown in size to satisfy these requirements, organizations have also looked at other technologies for storing vast amounts of information. These new systems are often referred to under the umbrella term “Big Data.”
Gartner has identified three key characteristics for big data: Volume, Velocity, and Variety. Traditional structured systems are efficient at dealing with high volumes and velocity of data; however, traditional systems are not the most efficient solution for handling a variety of unstructured data sources or semi structured data sources. Big Data solutions can enable the processing of many different types of formats beyond traditional transactional systems. Definitions for Volume, Velocity, and Variety vary, but most big data definitions are concerned with amounts of information that are too difficult for traditional systems to handle—either the volume is too much, the velocity is too fast, or the variety is too complex.
Big Data Foundation
- Course Name: Big Data Foundation
- Duration:3 days
- Exam Time:50 minutes
- Refreshments: Snacks
- Delivery Mode: Instructor Led
- Course Price: S$ 3500 (Including tax and exam fees)
- Minimum requirements:none
- Pass Guarantee:Yes. Check for conditions
- Pass Rate: 100% last 5 months. 98% past 2 years
- Funding: 70% funding for course and exam fees
Big Data Foundation
This course leads to the Big Data Foundation certification by the Cloud Credential Council (CCC). The CCC Big Data Foundation certification is the certification awarded to individuals who have successfully passed the CCC Big Data Foundation exam.
Our CCC Big Data/ Data Science Foundation course is a good place to start in case you do not have any experience with Big Data. It provides information on the best practices in devising a Big Data solution for your organization.
The technical components of the modules are optional. These components will be covered as the last topic each day.
1.Introduction To Big Data
- What is Big Data?
- Usage of Big Data in real world situations
2. Data Processing Lifecycle
- Sources of Data
Technical Components (Optional). The below modules will be covered end of the day. Introduction to Python
- Interactive computing
- Functions, arguments in Python
Introduction to Pandas
3. Source of Data
Data collection is expensive and time consuming. In some cases you will be lucky enough to have existing datasets available to support your analysis. You may have datasets from previous analyses, access to providers, or curated datasets from your organization. In many cases, however, you will not have access to the data that you require to support your analysis, and you will have to find alternate mechanisms. Twitter data is a good example as, depending on the options selected by the twitter user, every tweet contains not just the message or content that most users are aware of. It also contains a view on the network of the person, home location, location from which the message was sent, and a number of other features that can be very useful when studying networks around a topic of interest.
- Network Data
- Social Context Data
- Sendor Data
- Systems Data
- Machine log data
- Structured Vs Unstructured Data
4. First Order Analysis and exploration
- Basic Statistics
- Analyse your dataset and determine features
- Data validation
- Noise and bias
- Random errors
- Systematic errors
5. Graph Theory
Technical Components (Optional). The below modules will be covered end of the day. Introduction to NetworkX
- Adjacency Matrix
- Create a Graph
- Measure centrality
- Degree distribution
6. Second order analysis
According to the SAS institute, machine learning is a method of data analysis that automates analytical model building. Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look. There are two main classes of machine learning algorithms: (i) supervised and (ii) unsupervised learning. Exactly what does learning entail? At its most basic, learning involves specifying a model structure f that hopefully can extract regularities for the data or problem at hand as well as the appropriate objective function to optimize using a specified loss function. Learning (or fitting) the model essentially means finding optimal parameters of the model structure using provided input/target data. This is also called training the model. It is common (and best practice) to split the provided data into at least two sets – training and test data sets.
- Machine Learning
- Meta Data
- Training data and test data
- Identifying Features
Technical Components (Optional). The below modules will be covered end of the day.
- Introduction to Scikit-learn
- Introduction to Mlxtend
7. Rolling out Big Data projects
Hypothetical Big Data project use case: Cybersecurity measures within a company in relation to insider threats. The company hosts thousands of applications for various business functions. The context will be User Behavior Analytics. Signals include, login meta data for each application, location data, network data, employee data, performance appraisal data, travel data, deaktop activity data. The analytics is focused on determining a risk score based for each user.
Technological component or trend:
The technology component in the insider threat context requires collection and processing of the following data:
- User Data
- Application logs
- Access data
- Business data
- Assets, CMDB
- User activity
- Network data
A layered approach for data processing is ideal starting with implementation of a ETL (Extract, Transform, Load). Processing of data is done through tools.
- Extract, Transform, Load
- Data processing
- Risk profiling
- Data lake
The last layer is the data lake which stores all structured and unstructured data. This can be accessed through libraries such as pandas, hadoop, graph db etc.,
The data lake will enable building algorithms to determine risky behavior and send alerts. The objective is to prioritize the alerts based on a risk score. Example, a user accessing a certain application from a specific ip address with a recent low rating on his performance appraisal and has booked a long holiday will be flagged as high risk.
- Project Management
- Different Phases
- Technology components
- System architecture
Technical Components (Optional). The below modules will be covered end of the day.
- Data Coarsing
- Data suppression
Pass mark: 65%
3 weeks Data Science Bootcamp
Our 3 weeks intensive bootcamp covers in depth concepts around data science. You will be spending around 10-12 hours each day with our instructors working on various assignments geared towards acquiring the necessary skills to become a data scientist. The bootcamp is an ideal platform for those considering a long-term career in the field of big data and data science. Algorithms are the way forward and are becoming ubiquitous in many fields. The accuracy of algorithms in predicting outcomes is significantly higher than decisions made on gut feeling.
The 3 weeks intensive classroom program requires participants to complete some pre-course work of around 80 hours. The pre-course works covers the basics of python programming and brush up on statistics. We will provide you with the resources you need for the pre-course preparations. The pre-course program contains a test, which you need to pass before attending the intensive classroom preparation.
Week 1 will be focused on the technology aspects of data science. We will be using only open source systems and you will learn the required technology infrastructure for big data analytics. Some of the concepts, libraries you will be exposed to include Anacondas, Pandas, Numpy, NetworkX, MLXtend, machine learning, EC2 etc., Python will be the main programming language that will be used across the different libraries and packages.
Week 2 will be focused on statistics including Bayesian statistics, regression, correlations, graph theory, measures of centrality, measures of variance, normalization etc., After the basic, we will cover advanced statistics and algorithms with focus on Big O notation.
Week 3 will be focused on applying the technology and statistics to derive business insights. You will be working on large data sets to identify patterns and insights. You will be exposed to techniques such as lateral thinking and hypothesis testing. You will learn techniques to train your models to identify features.
- Course Name: Data Science Bootcamp
- Duration:3 weeks
- Refreshments: Snacks
- Delivery Mode: Instructor Led
- Price: S$ 5339 (Including course and GST)
- Minimum Requirements:Mandatory pre-course preparation
- Programming language:Python
- Funding: 70% funding for course
There is obvious visible information, which one is conscious of and there is information that comes off you. Example, from your phone one can determine which website you visited, who you called, who your friends are, what apps you use. Data science takes it further to reveal how close you are to someone, are you an introvert or an extrovert, when during the day are you most productive, how often do you crave for ice cream, what genre of movies you like, what aspects of social issues interest you the most etc.,
With the possibility of adding sensors to everything, now there is deeper insight into what is going on inside your body. Spending 10 minutes with a doctor who gives you a diagnosis based on stated or observed symptom is less useful than a system that has data about everything going on inside your body. Your health diagnosis is likely to be more accurate with analysis of data collected through devices such as fitbits and implantables.
The amount of data available with wearables and other devices provides for rich insight about how you live, work with others and have fun.
Big Data and analytics is made possible due to the digital breadcrumbs we leave. Digital breadcrumbs include things like location data, browsing habits, information from health apps, credit card transactions etc.,
The data lets us create mathematical models of how people interact, what motivates us, what influences our decision making process and how we learn from each other.
Big Data versus Information
One can think of Big Data as the raw data available in sufficient volume, variety and velocity. Volumes here refer to terabytes of data. Variety refers to the different dimensions of data. Velocity refers to the rate of change.
A bank can use credit card information to develop models that’s more predictive about future credit behavior. This provides better financial access. What you purchased, frequency of purchase, how often do you pay back, where do you spend money are better predictors of payment credibility than a simple one dimensional credit score.
Collection refers to getting your data together. One would look at multiple sources of data and ensure there is sufficient volumes to justify useful analysis. Example, server logs could provide data about time of logins, resources accessed, frequency of requests etc.,
Pre-processing refers to normalizing data into 1’s and 0’s. Data needs to be normalized for it to be made useful. For example, if we are comparing number of friends in your contacts with location data containing GPS coordinates, we would need to have both these features normalized. We can then determine whether number of friends has any link to mobility.
Hygiene involves separating the noise from signal. This is to ensure that the data is reflective of reality and there is no unusual patterns which are lost.
Analysis involves a first order look at our data to determine patterns
Visualization involves graphical representation of data to detect patterns. The data can show you things like increased spending at the end of a quarter reflecting a pattern.
Interpretation involves second order analysis to determine deeper insights. This is where machine learning comes to play. Are people having lesser kids over the years, if so, what factors seem to play a role etc.,
Frequently Asked Questions
Data Science is a combination of business, technical and statistical worlds. We will be covering the theoretical aspects of all three in class. As such, we don’t require participants to have a background in all three. Background in any one of the three will be sufficient. Those with a programming or statistical background can explore the practical technical aspects with the instructors from 4 – 7 pm.
No. The optional technical modules don’t have additional costs. However, to work through the optional technical modules, you need to have a background in either statistics or programming.
There are two funding programs. SkillsFuture credit and CITREP+. CITREP funding is applicable only for Singapore citizens and applies to both self sponsored and company sponsored. SkillsFuture applies to Singapore citizens and self-sponsored.
For CITREP+ funding, you must be a Singapore citizen and pass an exam at the end of the course. Exam will be on the last day of the class. CITREP+ funding is based on a claim that you will make after passing the exam. This means you will pay us the full course fees and IMDA will reimburse 70% of the course and exam fees after you make a claim. We will assist you with the claim process.
Upon passing the exam, you will receive a certificate from Cloud Credential Council as Certified in Big Data Foundation.
You can take the exam 2 times with no additional costs. Beyond the second attempt, you will need to pay for the exam fees.
Yes, the funding applies to all Singapore citizens irrespective of the industry.
The course does not have an academic minimum requirement. However, you need to be familiar with basic data analysis and have an understanding of school/ college statistics. You should already have knowledge of mean, standard deviation, median, variance. You should be able to make inferences from charts and graphs. Before joining the class, we will send you some data and you need to send us some insights about the data. Your insights about the data will determine if you will be able to get the most value from attending the class. Below is the link to the data analysis you need to perform before attending the class.
The difficulty level of the concepts depends on your background. If your job involves analyzing trends from data, you are likely to find the course easy. Before joining the class, we will send you some data and you need to send us some insights about the data. Your insights about the data will determine if you will be able to get the most value from attending the class.
Technology is one part of the data science world. The course covers business, statistical and technology. For example, the business side of the course covers figuring out the factors that influence sales. The statistical aspects involves uncovering the correlation between various factors that affect sales. The technology aspect involves writing code to elicit predictions. We spend about 2 hours at the end of the day in writing code in Python for those interested in the programming aspects.
No. This is a 3 day introductory course. Data science is an extensive field and can take years to be an expert. Many data scientists specialize in one particular domain. This course provides you with an overview of what is involved in data science.
The course covers the theoretical aspects of a Big Data Solution. The technical aspects of building a big data solution is not covered because there are so many different architectures and technologies.
Most of the participants are managers in companies across different industries who are evaluating opportunities for using analytics to make decisions. These managers are either exploring the application of data science within their own domain or are already working with data scientists and analysts. Upon completion of the course, these managers are in a better position to drive data science projects in their context. Most of these managers represent the business side of data science.
Gartner said there would be a shortage of 100,000 data scientists (US) by 2020. McKinsey put the national gap (US) in data scientists and others with deep analytical expertise at 140,000 to 190,000 people by 2017, resulting in demand that’s 60 percent greater than supply.
Accenture found that more than 90 percent of its clients planned to hire people with data science expertise, but more than 40 percent cited a lack of talent as the number one problem.