DATA SCIENCE USING PYTHON
ONLINE TRAINING COURSE
learn
Data Science
Using Python
The ultimate purpose of this course is to cover all the basics and advance predictive modeling techniques which are a must in today's competitive edge.
This course is known for the projects involved in it. This is purely Job Oriented training. You will work on highly exciting projects in the domains of high technology, Retail, Banking, Marketing, Clinical, Manufacturing, and so on.
Course Content
- Welcome/General Discussion about the expectation from course
- Definition of Data
- Difference between data management and data analytics
- Data Science components
Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS
- Python Overview
- Python Data Types
- Python operations using Numbers, String, Logical, Arithmetic and so on
- Python Strings
- Python Lists
- Python Tuple
- Python Dictionary
- FOR and WHILE loops
- IF/THEN/ELSE in Python
- Data Manipulation Using Numpy And Pandas
Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS
- Levels of Measurement and Variable types
- Descriptive Statistics and Picturing Distributions
- Confidence Interval for the Mean
Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS
- One-Sample T-Test of Comparing Means
- Two-Sample T-Test of Comparing Means
- One Way ANOVA
- Assumptions of ANOVA Modeling
- N-Way ANOVA
- ANOVA Post Hoc Studies
Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS
- Data Exploration by using Scatter Plots
- Pearson and Spearmen Correlations
Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS
- Fit Simple Linear Regression Model
- Assumptions of Linear Regression Model
- Analyze the output of the Linear Regression
- Producing Predicted Values
- Difference between Simple Linear Regression and Multiple Linear Regression Models
- Fit Multiple Linear Regression Model
- Stepwise Regression/Model Selection Techniques
Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS
- Residual Analysis
- Influential Observation
- Difference between Influential Observation and Outliers
- Collinearity Diagnostics
Model Building Process using Python
Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS
- Examining Distributions
- Test of Associations by using the chi-square test
- Fisher's Exact p-values for Pearson Chi-square test
Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS
- Odds and Odds Ratio
- Simple Logistic Regression
- Multiple Logistic Regression with categorical predictors
- Analyze the output of Logistic Regression
Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS
- Apply the principles of honest assessment to model performance measurement
- Rare event adjustments
- Assess classifier performance using the confusion matrix
- Model selection and validation using training and validation data
- Create and interpret graphs (ROC, lift, and gains charts) for model comparison and selection
- Establish effective decision cut-off values for scoring
Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS
- Introduction to Decision Tree Modeling
- Model essential for Decision Tree Models
- Decision Tree Model Development by using CHAID,Entropy/Information Gain, and Gini
- Decision Tree Model Tuning
Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS
- Introduction to Boosting
- Example of Boosting
- Regression Decision Tree
- Gradient Boosted Trees Regression.
Q&A/PROJECT DISCUSSION BASED ON PREVIOUS DAYS
- Introduction to Time Series Forecasting
- Component Factors affecting Time Series
- Moving Average (MA)
- Exponential Smoothing
- Trend Fitting Models (Linear trend, Quadratic trend, and Exponential trend)
- Autoregressive Integrated Moving Average (ARIMA) Model
- Vector Autoregression (VAR) Model
- Autoregressive Conditional Heteroskedasticity (ARCH) Model
- Generalized Autoregressive Conditional Heteroskedasticity (GARCH) Model
- Long Short-Term Memory (LSTM) Model
Project Content
Domain Risk Management
Problem Statement As an analyst, you need to advise your client to decide which mutual fund risk category should invest in.
Topic Descriptive Analytics, Distributions, and Visualization
Domain Manufacturing/Inventory Management
Problem Statement As a manager/supervisor of a company, you need to measure the effectiveness of the production of cereal boxes. The aim is to analyze whether or not the cereal boxes' weight is as per company specifications.
Topic Hypothesis Testing (One-Sample tests)
Domain Marketing/Retail
Problem Statement As a regional sales manager of a company, you need to analyze the mean sales comparison between two types of displays of products in the retail store. The aim is to decide whether or not the Promotional display of the product is more effective than the Normal display of the product. This helps management to decide the display location of the product in a store that will maximize sales
Topic Hypothesis Testing (Two-Sample tests)
Domain Clinical
Problem Statement Before you launch the new drug in the market, you need to analyze the effect of new drug and its different doses on the blood pressure of the human body
Topic Analysis of Variance (ANOVA Models)
Domain Physiology
Problem Statement In exercise physiology, an objective measure of aerobic fitness is how effectively the body can absorb and use oxygen during their 1.5 miles run. Factors affecting oxygen consumption are runtime, age, and gender, run pulse, rest pulse, and so on. The aim is to identify the key factors affecting oxygen consumption during a run.
Topic Analysis of Variance (EDA and Linear Regression Models)
Domain Event Analysis
Problem Statement On the 14th of April, the Titanic hit an iceberg and sank. There were 1517 fatalities from different age groups, class (1, 2, and 3), and gender. The objective is to measure how all these factors are associated with the survival status of passengers.
Topic Odds, Odds Ratio, Chi-Square tests, Ordinal associations, and Logistic Regression Model
Domain Marketing
Problem Statement A target marketing campaign for a bank was undertaken to identify a segment of customers who are likely to respond to an insurance product. Here, the target variable is whether or not the customers bought insurance product and it depends on factors like Product usage in three months, demographics, transaction patterns as like deposit amount, checking account, a branch of the bank, Residential information (like urban, rural) and so on.
Topic Classification, Categorical Data Analysis, Logistic regression, Decision Tree and Gradient Boosting (XGBOOST)
Domain Financial Analyst
Problem Statement Forecast the revenues of three companies (Eastman Kodak, Cabot Corporation, and Wal-Mart) in order to better evaluate investment opportunities for your client.
Topic Moving Average, Exponential Smoothing, Trend Fitting Models and ARIMA
Domain Economist
Problem Statement Is it Money Supply that “causes” the interest rates OR Interest rates that “causes” the Money Supply
Topic Vector Autoregression (VAR) Model
Domain Economist
Problem Statement Volatility Forecasting for U.S./U.K. exchange rates
Topic Volatility Forecasting for U.S./U.K. exchange rates
Domain Financial Analyst
Problem Statement Long-Term forecasting of BitCoin prices
Topic Long-Short Term Memory (LSTM) Model