Data Analysis in Python

Data Analysis in Python Course Overview

This course aims to equip delegates with a substantial knowledge of Python libraries (NumPy, Pandas, Matplotlib and others) and data analysis techniques to enable them to engineer enterprise level solutions in a data-driven environment.

Content in General:

The Pandas library, with its data preparation and analysis features will be our ultimate focus. After familiarizing ourselves with its two data structures, the Series and the DataFrame, we will use the latter to read, manipulate and generally process tabular data sourced from excel, csv and other file formats. However, before that we will thoroughly familiarize ourselves with the NumPy library, not only because it is the foundation of Pandas but, also because it offers powerful tools for numerical calculations and forms the basis of practically all of Pythons Data Science Libraries. We will explore its vectorized functions, basic linear algebra features and use its random library to demonstrate the sampling of different distributions.
----------------------------------------------------------------------------------------------------------------------------------------------

Who will the Course Benefit?

This course will benefit anyone who requires a solid practical foundation in Data Analysis, including descriptive statistics and visualisation in Python.

-------------------------------------------------------------------------------------------------------------------------------------------------

Course Objectives

This course aims to provide the delegate with the knowledge to be able to:

Determine the type of data at hand and decide of the most appropriate analysis and visualisation
Perform numerical calculations using the Python NumPy library
Use Pandas to read, explore, manipulate and process tabular data from various sources, including excel, csv, Json files and relational databases
Visualise and generally explore data using Matplotlib and Seaborn
Carry out descriptive statistical summaries on data in Python
Interpret graphs and statistical results correctly

------------------------------------------------------------------------------------------------------------------------------------------------

This is a Data Analysis in Python course by Uplatz.

We believe in learning-by-doing, so we have taken an integrated and problem-solving approach to delivering our training. The course is broken into sessions, each centred on a few related core concepts and skills. The relevant background is discussed at the beginning of the session, in a just-in-time approach. This is followed by illustrative examples, which includes the introduction of library features, syntax and semantics. For the second half, which is most of the session, the delegates are expected to solve relevant problems of graduated difficulty. Example solutions will be available for the delegates to take away at the end of the course.

This approach is effective as it integrates the learning of statistical theory, library features and Python language syntax, increasing retention by providing meaningful context for each. Immediate practice also helps delegates cement their understanding of concepts on which we build gradually.

The delegate will learn and acquire skills as follows:

Data Analysis Python

Numpy

Create and manipulate NumPy arrays and Matrices
Generate random numbers from various distributions
Use NumPy vectorized functions
Red array data from various common file formats

Pandas

Understand the composition, relation and main features of Pandas Series and DataFrame structures
Read Data from cvs, json, the web and relational database into DataFrames and Series
Data Cleaning and Preparations
Data Wrangling: Join, Combine and Reshape
Data Aggregation and Group operations
cvs, excel and other format data into Pandas DataFrame objects

Clean, group, manipulate and summarise tabular data using Pandas data processing features
Visualisation with Matplotlib (and Seaborn)

Plot

Bar, Column and Pie charts
box-plots
histograms
scatterplots and line-plots

Other

Use Jupyter Notebook and Jupyter Lab with the anaconda distribution

Statistics

Distinguish between different data types
Summarize Categorical and Numerical Data
Calculate basic descriptive statistical measures such as

Measures of Central Tendency:

Mean
Median
Mode

Measures of Dispersion:

Variance
Standard deviation
Quantiles

Understand the advantages and disadvantages of the various summary statistics
Decide on the best visual representation of any presented data
Understand Bivariate data and perform Correlation and basic Linear Regression
Produce various visual representation (or plots) of data

------------------------------------------------------------------------------------------------------------------------------------------------------------------------