SQL Server — Data Modeling and Data Mining

Description

The resources of the largest library on Earth, the Library of  Congress of the United States of America consist of around 30 million books. If all these books were entered into a computer, with an assumption that an average file size would be 1 megabyte large, they would take up 30 terabytes. For comparison, the database of a single delivery company stores over 20 terabytes of data related to the completed deliveries, and the database of mammal brains available at http:/brainmaps.org/, amount to over 50 terabytes. This means that single companies and organizations nowadays have on their hands an amount of data comparable to the Library of Congress collection acquired over a few hundred years. Extracting significant business information out of such huge amount of data calls for specialized IT systems. The basis for such data mining systems may be SQL Server Standard or Enterprise, 2005 or newer version, supplemented with Excel 2007 or its newer version. This means that no additional expenses need to be incurred in terms of new license purchases or user-training for new programs. The goal of the course is to prepare analysts, IT specialists, and business users alike to be able to create data mining models with the use of CRISP-DM methodology (Cross Industry Standard Process for Data Mining). Ours is the only course available not limited solely to presenting given data mining algorithms, but which will practically train you to use the algorithms successfully to solve real-life business problems.

Options

  • Course level: 300
  • Start date: as required
  • Duration time: 4 days

Price: 4100 zł

  • Open training
  • On-site (costumer's premises)
  • Customer’s computer

Class

The course has been designed for analysts, advanced business users, as well as programmers and database analysts. The participants will be given full data mining projects methodology, starting with defining a problem, through preparing data, creating their mining model, and finishing with the evaluation and implementation of the models. They will also get trained in Microsoft technologies which allow to create and asses data mining models and apply them in business analysis

Plan

The basis of the training is comprised of 14 modules. As the training, along with all our courses, has been designed entirely by our trainers, it can be adjusted freely to meet the individual needs of the participants. We would like to encourage you not only to choose relevant modules, but also send in your additional suggestions. Do not hesitate to ask in advance specific questions which you wish to hear answered during the course. We pride ourselves in limiting the time assigned to conducting lab-type tasks after the completion of each module to the advantage of practical exercises and demonstrations carried out with the assistance of the trainer. This allows us to pass on substantially more practical information and tips, and what is more, focus on issues directly within your scope of interest. However, in order not to deprive you of the opportunity to practice out the material covered, each day ends with a laboratory time of more or less an hour, where you can practice tasks and exercises of your choice.

Modules

Duration: Level:
Module 1

The Role of Data Mining in Business Analysis
  • Data Mining Process
  • Phenomenal Data Mining
  • Putting Forward Hypotheses
  • Correct Problem Posing
  • The Aims of Data Modeling and Data Mining
  • The Range of a Data Mining Project
  • Specifying Expected Results
  • Assessment of Project Failure Risk

120 minutes 300
Module 2

Evaluation and Preparation of Source Data
  • Measurement Errors
  • Data Profiling with SQL Server Integration Services
  • Attributes and their Values
  • Data Integrity
  • Sampling and Data Representativeness
  • Modeling Missing Data
  • Relationships between Attributes
  • State Space
  • Preparation of Discrete Attributes
  • Preparation of Continuous Attributes
  • Preparation of Data Series
  • Data Supplementing and Data Enrichment
  • Preparation of Data for Descriptive Models
  • Preparation of Data for Classification Models
  • Setting aside Test Data

180 minutes 400
Module 3

Data Mining Techniques
  • Business Scenarios
  • Data Mining Add-In for Office
  • Classic Data Mining Techniques (Classification, Estimation, Association, Clustering, Sequential Analysis, Variant Analysis, Forecasting)

60 minutes 200
Module 4

SQL Server as a Data Mining Platform
  • Excel as a Client of SQL Server Analysis Services (External Data Mining Tools, Work With Data Mining Models, Excel Formulas)
  • Data Mining Projects (Business Intelligence Development Studio, Data Sources, Source Data Views, Data Mining Structures, Data Mining Models, Predictive Queries)
  • Nesting Cases
  • Managing SSAS Server and Data Mining Models Using SQL Server Management Studio
  • Data Mining Services of SQL Server (Architecture, Security, Integration with the Remaining BI Facilities)

180 minutes 400
Module 5

DMX Language
  • Terminology
  • Syntax
  • Creating Data Mining Structures Creating Models
  • Reading Metainformation Concerning Data Mining Models and Structures
  • Model Training
  • Predictive Queries
  • Predictive Functions

90 minutes 400
Module 6

Microsoft Naive Bayes Algorithm
  • Overview, the Limitations and the Parameters of the Algorithm
  • Uses of Naive Bayes Classifier (Examining Relationships between Attributes, Document Classification)

90 minutes 300
Module 7

Microsoft Decision Trees and Microsoft Linear Regression Algorithm
  • Overview, the Limitations and the Parameters of the Algorithm
  • Uses of Decision Trees (Classification of Customers, Estimation of Potential Profits, Association of Customers and Purchased Goods)

90 minutes 300
Module 8

Microsoft Time Series Algorithm
  • Overview, the Limitations and the Parameters of the Algorithm
  • Uses of the Time Series Algorithm (Forecasting Sales, Forecasting Sales Using Cross-Series Data, Forecasting Sales Using Data Read from a Multidimensional Cube, Forecasting Sales Using Short Data Series, Variant Analysis)

90 minutes 300
Module 9

Microsoft Clustering Algorithm
  • Overview, the Limitations and the Parameters of the Algorithm
  • Uses of the Clustering Algorithm (Clustering, Case Classification, Preparation of Data for Further Exploration, Identifying Anomalies)

90 minutes 300
Module 10

Microsoft Sequence Clustering Algorithm
  • Overview, the Limitations and the Parameters of the Algorithm
  • Uses of Sequence Clustering Algorithm (Analysis of Sequences of the Visited Websites, Classification of Customers Based on the Sequence of their Purchases, Identifying Untypical Event Sequences)

90 minutes 300
Module 11

Microsoft Association Rules Algorithm
  • Overview, the Limitations and the Parameters of the Algorithm
  • The Uses of Association Rules (Examinations of Relationships between Attribute Values, Market Basket Analysis, Cross-Selling Analysis)

90 minutes 300
Module 12

Microsoft Neural Network And Microsoft Logistic Regression Algorithm
  • Overview, the Limitations and Parameters of the Algorithm
  • The Uses of Neural Networks and Logistic Regression (Estimation of Potential Profits, Classification of Documents)

90 minutes 300
Module 13

Evaluation And Improvement Of Data Mining Models
  • Regression to the Mean
  • Measuring the Model’s Effectiveness (Interpretability, Prediction Accuracy, Prediction Stability, Effectiveness and Scalability, Usability)
  • Methods of Evaluating Data Mining Models (Lift Chart, Profit Chart, Classification Matrix, Evaluation of the Accuracy of Microsoft Time Series Algorithm Models, Cross-Validation, Intra- and Intercluster Correlation Coefficient)
  • Common Mistakes (Ill-Posed Tasks, Incorrect Source Data, Unprepared Source Data, Incorrect or Poorly Chosen Data Mining Algorithm Parameters)

120 minutes 300
Module 14

Predictive Programming
  • Programming Tools
  • Visualizers
  • SSRS Reports Intelligent Application (Data Correctness Control, Completing Missing Data, Adaptive Interface)

60 minutes 400

Form

Ask Question

Register