Data Analysis in Python
Data Analysis in Python Course Overview
This course aims to equip delegates with a substantial knowledge of Python libraries (NumPy, Pandas, Matplotlib and others) and data analysis techniques to enable them to engineer enterprise level solutions in a data-driven environment.
Content in General:
The Pandas library, with its data preparation and analysis features will be our ultimate focus. After familiarizing ourselves with its two data structures, the Series and the DataFrame, we will use the latter to read, manipulate and generally process tabular data sourced from excel, csv and other file formats. However, before that we will thoroughly familiarize ourselves with the NumPy library, not only because it is the foundation of Pandas but, also because it offers powerful tools for numerical calculations and forms the basis of practically all of Pythons Data Science Libraries. We will explore its vectorized functions, basic linear algebra features and use its random library to demonstrate the sampling of different distributions.
----------------------------------------------------------------------------------------------------------------------------------------------
Who will the Course Benefit?
This course will benefit anyone who requires a solid practical foundation in Data Analysis, including descriptive statistics and visualisation in Python.
-------------------------------------------------------------------------------------------------------------------------------------------------
Course Objectives
This course aims to provide the delegate with the knowledge to be able to:
- Determine the type of data at hand and decide of the most appropriate analysis and visualisation
- Perform numerical calculations using the Python NumPy library
- Use Pandas to read, explore, manipulate and process tabular data from various sources, including excel, csv, Json files and relational databases
- Visualise and generally explore data using Matplotlib and Seaborn
- Carry out descriptive statistical summaries on data in Python
- Interpret graphs and statistical results correctly
This is a Data Analysis in Python course by Uplatz.
Data Analysis in Python
We believe in learning-by-doing, so we have taken an integrated and problem-solving approach to delivering our training. The course is broken into sessions, each centred on a few related core concepts and skills. The relevant background is discussed at the beginning of the session, in a just-in-time approach. This is followed by illustrative examples, which includes the introduction of library features, syntax and semantics. For the second half, which is most of the session, the delegates are expected to solve relevant problems of graduated difficulty. Example solutions will be available for the delegates to take away at the end of the course.
This approach is effective as it integrates the learning of statistical theory, library features and Python language syntax, increasing retention by providing meaningful context for each. Immediate practice also helps delegates cement their understanding of concepts on which we build gradually.
The delegate will learn and acquire skills as follows:
Data Analysis Python
- Numpy
- Create and manipulate NumPy arrays and Matrices
- Generate random numbers from various distributions
- Use NumPy vectorized functions
- Red array data from various common file formats
- Pandas
- Understand the composition, relation and main features of Pandas Series and DataFrame structures
- Read Data from cvs, json, the web and relational database into DataFrames and Series
- Data Cleaning and Preparations
- Data Wrangling: Join, Combine and Reshape
- Data Aggregation and Group operations
- cvs, excel and other format data into Pandas DataFrame objects
- Clean, group, manipulate and summarise tabular data using Pandas data processing features
- Visualisation with Matplotlib (and Seaborn)
- Plot
- Bar, Column and Pie charts
- box-plots
- histograms
- scatterplots and line-plots
- Other
- Use Jupyter Notebook and Jupyter Lab with the anaconda distribution
Statistics
- Distinguish between different data types
- Summarize Categorical and Numerical Data
- Calculate basic descriptive statistical measures such as
- Measures of Central Tendency:
- Mean
- Median
- Mode
- Measures of Dispersion:
- Variance
- Standard deviation
- Quantiles
- Understand the advantages and disadvantages of the various summary statistics
- Decide on the best visual representation of any presented data
- Understand Bivariate data and perform Correlation and basic Linear Regression
- Produce various visual representation (or plots) of data