• phone icon +44 7459 302492 email message icon info@uplatz.com
  • Register
0- - 0
Job Meter = High

Introduction to SAS and Hadoop

30 Hours
Online Instructor-led Training
GBP 999 (USD 2800)
Save 50% Offer ends on 30-Jun-2024
 Introduction to SAS and Hadoop course and certification
318 Learners

About this Course

This course teaches you how to use SAS programming methods to read, write, and manipulate Hadoop data. Base SAS methods that are covered include reading and writing raw data with the DATA step and managing the Hadoop file system and executing Map-Reduce and Pig code from SAS via the HADOOP procedure. In addition, the SAS/ACCESS Interface to Hadoop methods that allow LIBNAME access and SQL pass-through techniques to read and write Hadoop Hive tables structures is part of this course. Although not covered in any detail, a brief overview of additional SAS and Hadoop technologies, including DS2, high-performance analytics, SAS LASR Server, and in- memory Statistics, as well as the computing infrastructure and data access methods that support these, is also part of this course.

Learn how to
  • read and write Hadoop files with the FILENAME statement
  • execute and use Hadoop commands with PROC HADOOP
  • invoke execution of MapReduce programs and Pig programs in Hadoop within a SAS program
  • access Hadoop distributions using the LIBNAME statement and the SQL pass-through facility
  • create and use SQL procedure pass-through queries
  • use options and efficiency techniques for optimizing data access performance
  • join data using the SQL procedure and the DATA step
  • use Base SAS procedures with Hadoop
  • write programs to create source data for SAS High Performance Analytics programs and execute SAS High Performance Analytics programs to analyze the data in parallel
  • write SAS programs to start up a SAS LASR server grid, load data into memory in parallel and process that data in parallel with the IMSTAT procedure.
------------------------------------------------------------------------
Target Audience

SAS programmers that need to access data in Hadoop from within SAS
------------------------------------------------------------------------

Introduction to SAS and Hadoop

Course Details & Curriculum
Introduction
  • what is Hadoop?
  • how SAS interfaces with Hadoop
Accessing HDFS and Invoking Hadoop Applications from SAS
  • overview of methods available in Base SAS for interacting with Hadoop
  • reading and writing Hadoop files using Base SAS
  • methods
  • executing MapReduce code
  • executing Pig code using PROC HADOOP
Using the SQL Pass-Through Facility
  • understand the SQL procedure pass-through facility
  • connecting to a Hadoop Hive database
  • learning methods to query Hive tables
  • investigating Hadoop Hive metadata
  • creating SQL procedure pass-through queries
  • creating and loading Hive tables with SQL pass-through EXECUTE statements
  • handling Hive STRING data types
Using the SAS/ACCESS LIBNAME Engine
  • using the LIBNAME statement for Hadoop
  • using data set options
  • creating views
  • combining tables
  • benefits of the LIBNAME method
  • using PROC HDMD to access delimited data, XML data, and other non-Hive formats
  • performance considerations for the SAS/ACCESS LIBNAME statement
  • copying data from a SAS library to a Hive library
Partitioning and Clustering Hive Tables
  • identifying partitioning, clustering, and indexing methods in Hive
  • understand how partitioning and clustering can increase query performance
  • creating and loading partitioned and clustered Hive tables
Overview of SAS In-Memory Analytics and the Code Accelerator for Hadoop
  • using high-performance procedures and the SASHDAT library engine
  • creating a LASR Analytic server session
  • using the SASIOLA engine
  • executing DS2 threads in the Hadoop cluster to summarize data
  • using PROC HDMD to access HDFS files
------------------------------------------------------------------------

Didn't find what you are looking for?  Contact Us

course.php