Certificate Associate in Data Science – Data Acquisition
It is never much fun to work with code that is not formatted properly or uses variable names that do not convey their intended purpose. The same can be said of data, except that bad data can result in inaccurate results. Thus, data acquisition is an important step in the analysis of data. Data is available from a number of sources but must be retrieved and ultimately processed before it can be useful. It is available from a variety of sources. We can find it in numerous public data sources as simple files, or it may be found in more complex forms across the Internet. In this course, we will demonstrate how to acquire data from several of these, including various Internet sites and several social media sites.
We can access data from the Internet by downloading specific files or through a process known as web scraping, which involves extracting the contents of a web page. We also explore a related topic known as web crawling, which involves applications that examine a web site to determine whether it is of interest and then follows embedded links to identify other potentially relevant pages.
When extracting data from a site, many different data formats may be encountered. We will examine three basic types: text, audio, and video. However, even within text, audio, and video data, many formats exist. For audio data alone, there are 45 audio coding formats.
When we discuss data formats, we are referring to content format, as opposed to the underlying file format, which may not even be visible to most developers. We cannot examine all available formats due to the vast number of formats available. Instead, we will tackle several of the more common formats, providing adequate examples to address the most common data retrieval needs. Specifically, we will demonstrate how to retrieve data stored in the following formats:
Some of these formats are well supported and documented elsewhere. For example, XML has been in use for years and there are several well-established techniques for accessing XML data in Python. For these types of data, we will outline the major techniques available and show a few examples to illustrate how they work. This will provide those participants who are not familiar with the technology some insight into their nature.
The most common data format is binary files. For example, Word, Excel, and PDF documents are all stored in binary. These require special software to extract information from them. Text data is also very common.
Upon completion of the course, participants should be able to:
- Apply techniques to acquire structured and unstructured data
- Transform data into a dataframe
- Make API calls and fetch data
- Apply web scrapping techniques to get data
Who should attend
Data Analysts, Data Engineers, Data Science Enthusiasts, Business Analysts, Project Managers
Foundational certificate in Big Data/Data Science
This course is meant for anyone who are comfortable developing applications in Python, and now want to enter the world of data science or wish to build intelligent applications. Aspiring data scientists with some understanding of the Python programming language will also find this course to be very helpful. If you are willing to build efficient data science applications and bring them in the enterprise environment without changing your existing python stack, this course is for you.
Mix of Instructor-led, case study driven and hands-on for select phases
H/w, S/w Reqd
Python, Pandas, Numpy, System with at least 2GB RAM and a Windows /Ubuntu/Mac OS X operating system
24 Hours (2 days Instructor led + 8 hours online learning)
- Course Name: Certificate Associate in Data Science – Data Acquisition
- Location: Singapore
- Duration: 2 days classroom + 8 hours online
- Exam Time: 60 minutes
- Course Price: Call for price
- Minimum requirements: Foundational Certificate in Programming
|#||Topic||Method of Delivery|
Chapter 1 : Getting and Saving Data
Getting Data in Different Formats
Connecting to a Database
Adding Information to a Database
Query a Database
Chapter 2 : Web Scraping with BeautifulSoup
Exploring Web Scraping
Scraping the Web Safely
Downloading and Reading a Webpage
Extracting Useful Information
Chapter 3: Getting data from APIs
Chapter 4: Social Data Sources
|Online Self paced|
- Certificate Title: Certificate Associate in Data Science – Deep Acquisition
- Certificate Awarding Body: ITPACS
Information Technology Professional Accreditations and Certifications Society (ITPACS) is a non-profit organization focused on improving technology skills for the future. ITPACS offers associate level, professional level and leader certifications across 6 domains including data science, web development, mobile development, cyber security, IoT and blockchain. Applicants have to go through a exam eligibility process demonstrating their experience.
The Associate certification is catered to individuals with less than 1 year working experience in the field. This is ideal for newcomers starting out in the profession or those seeking to make an entry into the profession. Applicants are required to have completed the application process prior to taking the exam.
- Exam Format: Closed-book format.
Questions: 30 multiple choice questions, coding exercises
Passing Score: 65%
Exam Duration: 60 minutes
- Exam needs to be taken within 12 months from the exam voucher issue date
Data science is not a single science as much as it is a collection of various scientific disciplines integrated for the purpose of analyzing data. These disciplines include various statistical and mathematical techniques, including:
- Computer science
- Data engineering
- Domain-specific knowledge and approaches
With the advent of cheaper storage technology, more and more data has been collected and stored permitting previously unfeasible processing and analysis of data. With this analysis came the need for various techniques to make sense of the data. These large sets of data, when used to analyze data and identify trends and patterns, become known as big data.
The process of analyzing big data is not simple and evolves to the specialization of developers who were known as data scientists. Drawing upon a myriad of technologies and expertise, they are able to analyze data to solve problems that previously were either not envisioned or were too difficult to solve.
The various data science techniques that we will illustrate have been used to solve a variety of problems. Many of these techniques are motivated to achieve some economic gain, but they have also been used to solve many pressing social and environmental problems. Problem domains where these techniques have been used include finance, optimizing business processes, understanding customer needs, performing DNA analysis, foiling terrorist plots, and finding relationships between transactions to detect fraud, among many other data-intensive problems.
Data mining is a popular application area for data science. In this activity, large quantities of data are processed and analyzed to glean information about the dataset, to provide meaningful insights, and to develop meaningful conclusions and predictions. It has been used to analyze customer behavior, detecting relationships between what may appear to be unrelated events, and to make predictions about future behavior.
Machine learning is an important aspect of data science. This technique allows the computer to solve various problems without needing to be explicitly programmed. It has been used in self-driving cars, speech recognition, and in web searches. In data mining, the data is extracted and processed. With machine learning, computers use the data to take some sort of action.