Get the printable pdf version here: [Download not found]
Application Development
SAS-Implementations Supporting Satellite, Aircraft, and Drone-based Remote Sensing Endeavors and Their Influences in the Classroom
Hallum, Cecil R. – Sam Houston State University
SAS has been a critically significant “partner” over a career spanning 40 years for this researcher. This presentation summarizes key SAS applications in satellite, aircraft and drone-based remote sensing endeavors (beginning in the early 70’s when SAS was first implemented at NASA/Johnson Space Center). The coverage includes recent multivariate strategies implemented in SAS geared toward finding missing bodies in digital imagery collected from drone flights as well as current research oriented toward improving the accuracies and speeds of such capabilities. Discussion of the impact of this research in the classroom for educational purposes at the university and high school levels is emphasized as well.
Using a Free Microsoft tool to help manage your EBI SAS server
Strickland, Dan W – Texas Parks and Wildlife
It has been said that you get what you pay for. In this instance the software is free but the benefit is great. This paper will describe how Texas Parks and Wildlife uses the free utility, Microsoft Process Explorer to look into what processes are actually running on their BI SAS server. Is your server slow? Look to see what processes are currently using your processors. Track workspace server jobs and see their current status. You can even pause or kill their processing. You will also be able to link workspace jobs to the files on your work directory to keep the work directory free from unusable files. This paper will take you from download to configuration to usage of this free Microsoft software.
Exploring trends in topics via Text Mining SUGI/Global Forum proceedings abstracts
Shaik, Zubair* and Dr. Goutam Chakraborty – Oklahoma State University
Many organizations across the world have already realized the benefits of text mining to derive valuable insights from unstructured data. While text mining has been mainly used for information retrieval and text categorization, in recent years text mining is also being used for discovering trends in textual data. Given a set of documents with a time stamp, text mining can be used to identify trends of different topics that exist in the text and how they change over time. We apply Text Mining using SAS® Text Miner 4.3 to discover trends in the usage of SAS tools in various industries via analyzing all 8,429 abstracts published in SUGI/SAS Global Forum from 1976 to 2011. Results of our analysis clearly show a varying trend in the representation of various industries in the conference proceedings from decade to decade. We also observed a significant difference in the association of key concepts related to statistics or modeling during the four decades.
We show how %TMFILTER macro combined with PERL regular expressions can be used to extract required sections (such as abstract) of text from a large corpus of similar documents. Our approach can be followed to analyze papers published in any conference provided the conference papers are accessible in common formats such as .doc, .pdf, .txt, etc.
SAS® Information Studio – Map Your Way Through the Data
Farias, Alejandro – Texas Parks and Wildlife
This reference document can serve as a summary instructional tool for SAS® Information Studio and is written to assist those responsible for providing access to data, such as an information architect, for data consumers. Topics covered include:
• Selecting Tables
• Table Relationships
• Selecting Data Items
• Organizing Data Items
• Creating a Custom Category or Calculated Data Item
• Single and Combination Filters
• Prompts
• Test Queries
• Resource Replacement/Moving/Saving Information Maps
In the simplest terms, SAS® Information Maps enables data consumers to access data. Information maps can be utilized by several SAS® products, including but not limited to Enterprise Guide, Add-In for Microsoft Office, Web-Report Studio and Information Delivery Portal.
Data consumers are not required to know or even understand SQL or the structure of the underlying data source. An information architect can utilize predefined business logic or calculations, filters, and prompts to aid the data consumer in querying data. By simplifying the process of data accessibility, data consumers can focus on analyzing data output rather than spending time learning how to access, modify or select data for analysis.
Incorporating DataFlux dfPower Studio 8.2 into a Graduate-Level Information Quality Tools Class
Zhou, Yinle * and Talburt, John R. – University of Arkansas at Little Rock
The University of Arkansas at Little Rock (UALR) currently offers the only graduate degree program in information quality in United States. SAS DataFlux is a founding sponsor of the program and also continues to support the program through an academic license for its SAS DataFlux dfPower Studio product. The presentation describes how DataFlux dfPower Studio 8.2 has been incorporated into the Information Quality tools course in a way that not only helps students to understand and practice data quality techniques, but also gives students an introduction to data governance and master data management. The presentation also includes a description of how the laboratory exercises given in the course follow the DataFlux ”Five Steps to More Valuable Enterprise Data” methodology. It describes how each step is introduced first in class by a lecture from the course instructor, followed by an in-class software demonstration by the laboratory instructor. Students are given assignments to further develop their knowledge and to practice the techniques with the software. During the laboratory sessions, students become familiar with the basic operations and are able to build workflows to solve their assignments. An example is given where some students were even able to code a q-Gram Tetrahedral Ratio for approximate string matching and were able to add it to their workflow as a java plug-in. The presentation also discusses the “Data Challenge” team project that supplements the regular laboratory exercises, and how DataFlux dfPowerStudio is used by the teams to solve the data challenge in an iterative fashion.
Tracking and Reporting Account Referral Activity using Hash tables and SAS BI
Beaver, James L* and Scroggins, Tobin – Farm Bureau Bank
One of the tasks of the analysis division at the bank is to keep track of new account referrals made by bank representatives based upon location, sales territory and manager. This is made more difficult because over time representatives may change their sales territory, no longer be active, report to different managers or more than one manager, or their reporting entities may change over time. The reporting requirements include being able to report on all activity based upon current sales territory and manager as well as to report on activity based upon sales territory and manager at the time the referral was made. Referrals by sales territory needed to be able to report all referrals in that territory over a period of time by all representatives as well as only those sales made by currently active representatives. To handle these requirements, a slowly changing dimension table was created as part of a data warehouse. This table is maintained using the SAS hash object to reduce processing time. Reports are produced using SAS BI tools including OLAP cubes and web report studio. This paper demonstrates the use of the SAS hash object to maintain the table and provides examples of reporting techniques.
Beyond the Basics
The FILENAME Statement: Interacting with the World Outside of SAS®
Chris Schacherer – Clinical Data Management Systems, LLC
The FILENAME statement has a very simple purpose-to specify the fileref (or, file reference) that serves as the link to an external file or device. The statement itself does not process any data, specify the format or shape of a dataset, or directly produce output of any type, yet this simple statement is an invaluable SAS® construct that allows SAS programs to interact with the world outside of SAS. Through specification of the appropriate device type, the FILENAME statement allows SAS to symbolically refer to external disk files for the purpose of reading and writing data, interact with FTP Servers to read and write remote files, send e-mail messages, and gather data from external programs-including the local operating system and remote web services. The current work explores these uses of the FILENAME statement and provides examples of how you can use the different device types to perform a variety of data management tasks.
Protecting Macros and Macro Variables: It Is All About Control
Eric Sun and Art Carpenter – CALOXY
In a regulated environment it is crucially important that we are able to control, which version of a macro, and which version of a macro variable, is being used at any given time. As crucial as this is, in the SAS® macro language this is not something that is easily accomplished. We can write an application that calls a macro that we have validated, but the user of the application can write and force the use their user written un-validated version of that same macro. Values of macro variables that we have populated can be “accidentally” replaced by user written ssignments. How can you guarantee that the end results are accurate if you cannot guarantee that the proper programs have been executed?
Although our tools are limited, there are a few options available that can be used to help us control our macro execution environment. These, along with management programs, can give the application developer better control, and greater protection, during the execution of the application.
For a successful macro language application, it is all about CONTROL!!
You Can’t Get There From Here If You Don’t Know Where Here Is. Improving the SAS Enterprise Guide Data Characterization Task
Patricia Hettinger – Independent Consultant
SAS Enterprise Guide has many useful built-in tasks. The data characterization task gives some useful information but has several drawbacks. One is that it will run frequencies on all character data regardless of length or number of distinct values. This can result in some variables being dropped from the output due to too many distinct values. It also results in frequencies not being run for numeric values at all. Another is that minimum, maximum, number of missing values and number of non-missing values will be calculated only for numeric variables when this information would be useful for any variable. A third issue is the likelihood of your system hanging when attempting to analyze large data sets with many variables. This paper details how you can overcome these obstacles as well as incorporating your profile results into a useful mapping document.
At Random – Sampling with Proc Surveyselect
Patricia Hettinger – Independent Consultant
Are you still sampling in this very common way? Read your source data, assign a random number, sort the data and then take every nth one? Proc surveyselect is a newer method for sampling data. This paper covers the basic features of this proc as well as a comparison with the aforementioned random number assignment.
Can’t Decide Whether to Use a DATA Step or PROC SQL? You Can Have It Both Ways with the SQL Function!
Jeremy W. Poling – B&W Y-12 L.L.C.
Have you ever thought that it would be nice if you could execute a PROC SQL SELECT statement from within a DATA step? Well, now you can! This paper describes how to create an SQL function using the FCMP procedure and the RUN_MACRO function. The SQL function accepts a SELECT statement as its only argument. By using the SQL function, you now have the ability to integrate the DATA step and the SQL procedure.
Intro to Longitudinal Data: A Grad Student "How-To" Paper
Priest, EL- University of North Texas School of Public Health and Baylor Health Care System
Collinsworth, AW – Tulane University and Baylor Health Care System
Grad students learn the basics of SAS programming in class or on their own. Although students may deal with longitudinal data in class, the lessons focus on statistical procedures and the datasets are usually ready for analysis. However, longitudinal data may be organized in many complex structures, especially if it was collected in a relational database. Once students begin working with “real” longitudinal data, they quickly realize that manipulating the data so it can be analyzed is its own time consuming challenge. In the real world of messy data, we often spend more time preparing the data than performing the analysis.
Students need tools that can help them survive the challenges of working with longitudinal data. These challenges include identifying records, counting repeat observations, performing calculations across records, and restructuring repeating data from multiple observations to single observation. This paper will use real world examples from grad students to demonstrate useful functions, FIRST. and LAST. variables, and how to transform datasets using arrays, data step programming, and PROC TRANSPOSE.
This paper is the fifth in the "Grad Student How-To" series and gives graduate students useful tools for working with longitudinal data.
Finding Oracle® Table Metadata: When PROC CONTENTS Is Not Enough
Jeremy W. Poling – B&W Y-12 L.L.C
Complex Oracle® databases often contain hundreds of linked tables. For SAS/ACCESS® Interface to Oracle software users who are unfamiliar with a database, finding the data they need and extracting it efficiently can be a daunting task. For this reason, a tool that extracts and summarizes database metadata can be invaluable. This paper describes how Oracle table metadata can be extracted from the Oracle data dictionary using the SAS/ACCESS LIBNAME statement in conjunction with traditional SAS® DATA step programming techniques. For a given Oracle table, we discuss how to identify the number of observations and variables in the table, comments, variable names and attributes, constraints, indexes, and linked tables. A macro is presented which can be used to extract all the aforementioned metadata for an Oracle table and produce an HTML report. Once stored as an autocall macro, the macro can be used to quickly identify helpful information about an Oracle table that cannot be seen from the output of PROC CONTENTS.
This paper assumes that the reader has a basic understanding of DATA step programming, the macro language, and the SAS/ACCESS LIBNAME statement. A basic understanding of relational database management system concepts is also helpful, but not required.
They're Closing Down the Office in Kalamazoo and You’ve Been Tapped to Run Their In-house “Production” Reporting System, OH MY!
David Cherry Welch – Citi
The author's experience in running a home-grown reporting system for a major corporation is described as well as these features of the system:
• Setting the SAS Environment Using autoexec.sas and Command Line Switches
• Controlling Program Start Times with the SAS Data Step
• Creating and Maintaining Report Distribution Lists Using SAS Formats
• Modularizing Reports Using the SAS Macro Language and %include
The benefits of this ad hoc system will be compared to other alternatives. The author will discuss the lessons learned from this assignment and suggestions for making projects easily supportable for others that may follow.
The easy way to include XML formatted data into your SAS reports
Mary Grace Crissey – Pearason
Originally designed to meet the challenges of large-scale electronic publishing, XML is playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere. Many heterogeneous information systems have chosen XML as their preferred method of data exchange – especially in pharmaceutical and medical environments, with usage growing in the financial and educational domains. With the stand alone utility tool , SAS XML Mapper 9.21, we can unlock the mystery of the XML “foreign” data to “see” the hidden metadata structure . I will show you how easy it is to install (and where to find the FREE GUI) in this short tutorial on how to make sense of the tags and Xpaths embedded as XML values. With XML Mapper , we can display your XML values visually in a hierarchical tree structure and produce schemas and syntax necessary to feed into the SAS libname engine. Examples from my educational testing assessment reporting project for the State of Wyoming will be presented. This talk presents a drag and drop way of exploring ALL your data – be they arriving as flat files, MS excel spreadsheets, mainframe files, SAS data, or eXtensible Markup Language.
Mastering Non-Standard Data Sources with SAS: Using Patterns, Parsing, and Text to Handle Difficult Files
Glezen, Georgeanna N. *, Independent Consultant
With the availability of many new data and file sources (web, transaction logs, free form reports) it challenges our ability to extract and process information that can be used by SAS. Learn how to identify patterns and utilize SAS programming to parse and capture non-standard data. We will walk through some examples and checklists on how to identify appropriate processing and learn how to successfully handle complex file layouts.
Tips & Tricks: Creation of Regional Maps via Enterprise Guide
Brandon Vaughn – Dell
Although SAS Enterprise Guide does not have a built in task to create map graphs, this is still possible via coding. This paper shows you how to use the PROC GMAP procedure to create different types of maps: choropleth, prism, surface, and block. Use of the maps in a real business context (e.g., cycle time) will be presented with a special emphasis on prism graphs. Special detail will be given to a prism graph in which the color intensity and height represent two different attributes in the data.
Win With SAS®, JMP®, and Special Interest User Groups
Shipp, Charles Edwin – JMP 2 Consulting, Inc.
Lafler, Kirk Paul – Software Intelligence Corporation
Have you considered an in-house group for SAS®, JMP®, or special interests? We discuss how to start and maintain an in-house group. Benefits include leadership opportunity, peer-to-peer interaction, tutorials, collaboration of users and also departments, a focal point for proper requests, and getting to know other users and providers. This presentation discusses the differences in corporate cultures and provides examples of successful school, company, and government user groups. We then summarize "key" critical success factors.
An Introduction to SAS® Hash Programming Techniques
Lafler, Kirk Paul – Software Intelligence Corporation
Beginning in Version 9, SAS software supports a DATA step programming technique known as hash that provides for faster table lookup, search, merge/join, and sort operations. This presentation introduces what a hash object is, how it works, and the syntax required. Essential programming techniques will be illustrated to sort data, search memory-resident data using a simple key to find a single value, as well as more complex programming techniques that use a composite key to search for multiple values.
Foundations & Fundamentals
Take a Fresh Look at SAS® Enterprise Guide®: From point-and-click ad hocs to robust enterprise solutions
Chris Schacherer – Clinical Data Management Systems, LLC
Early versions of SAS Enterprise Guide (EG) met with only lukewarm acceptance among many SAS programmers. As EG has matured, however, it has proven to be a powerful tool not only for end-users less familiar with SAS programming constructs, but also for experienced SAS programmers performing complex ad hoc analyses and building enterprise class solutions. Still, many experienced SAS programmers fail to add EG to their SAS toolkit. They face the barriers of an unfamiliar interface, new nomenclature, and uncertainty that the benefits of using EG outweigh the time spent mastering it. Especially for this group, (but also for analysts new to SAS), the present work attempts to orient new EG users to the interface and nomenclature while teaching them how to achieve common data management and analytic tasks they perform with ease in SAS. In addition, EG concepts and techniques that focus on using EG as a development environment for producing end-user analytic solutions are described.
Traffic Lighting: The Next Generation
VanBuskirk, J and Harper, J – Baylor Health Care System
Traffic lighting is a tool intended to let a reader quickly evaluate data and at-a-glance sort out good versus bad performance. As the name implies, traditional traffic lighting generally separates data into three categories highlighted with red, yellow, and green based on performance. However, if all the cells in a table are boldly colored in primary colors, the reader is unable to achieve their goal of easily sorting between good and bad results. In addition, colorblind users will not be able to even distinguish between colors.
In this paper we will present the visual design changes made to our output and the SAS techniques behind them. We sought to reduce the amount of decoration in our tables and focus on what was important, the data! We employed a more subtle use of cell shading and borders to effectively draw the reader's eye to the results that most need their attention. Typically, traffic lighting is done using PROC FORMAT, however our data cells contained text strings which required a more complex use of style options, COMPUTE blocks, flag variables, and macros to implement. However all this is quite do-able in SAS with ODS and the results were a much more effective table delivered to our partners.
The MEANS/SUMMARY Procedure: Doing More
Art Carpenter – CALOXY
The MEANS/SUMMARY procedure is a workhorse for most data analysts. It is used to create tables of summary statistics as well as complex summary data sets. The user has a great many options which can be used to customize what the procedure is to produce. Unfortunately most analysts rely on only a few of the simpler basic ways of setting up the PROC step, never realizing that a number of less commonly used options and statements exist that can
greatly simplify the procedure code, the analysis steps, and the resulting output.
This tutorial introduces a number of important and useful options and statements that can provide the analyst with much needed tools. Some of these tools are new, others have application beyond MEANS/SUMMARY, all have a practical utility. With this practical knowledge, you can greatly enhance the usability of the procedure and then you too will be doing more with MEANS/SUMMARY.
PROC TABULATE: Getting Started
Art Carpenter – CALOXY
Although PROC TABULATE has been a part of Base SAS® for a very long time, this powerful analytical and reporting procedure is very under utilized. TABULATE is different; it's step statement structure is unlike any other procedure. Because the programmer who wishes to learn the procedure must essentially learn a new programming language, one with radically different statement structure than elsewhere within SAS, many do not make the effort.
The basic statements will be introduced, and more importantly the introduction will provide a strategy for learning the statement structure. The statement structure relies on building blocks that can be identified and learned individually and in concert with others. Learn how these building blocks form the structure of the statement, how they fit together, and how they are used to design and create the final report.
The Way of the Semicolon or Three Things I Wish I Had Known Before I Ever Keyed One
Patricia Hettinger – Independent Consultant
Learning SAS or teaching it to someone else can be very difficult. The author has found that understanding three main aspects of SAS very helpful. These are of course, the semicolon, the physical nature of a SAS data set and the two major data types. The intended audience is for those who are new to SAS or new to teaching it.
Looking Beneath the Surface of Sorting
Kuligowski, Andrew T.
Many things that appear to be simple turn out to be a mask for various complexities. For example, as we all learned early in school, a simple drop of pond water reveals a complete and complex ecosystem when viewed under a microscope. A single snowflake contains a delicate crystalline pattern. Similarly, the decision to use data in a sorted order can conceal an unexpectedly involved series of processing and decisions.
This presentation will examine multiple facets of the process of sorting data, starting with the most basic use of PROC SORT and progressing into options that can be used to extend its flexibility. It will progress to look at some potential uses of sorted data, and contrast them with alternatives that do not require sorted data. For example, we will compare the use of the BY statement vs. the CLASS statement in certain PROCs, as well as investigate alternatives to the MERGE statement to combine multiple datasets together.
Type Less, Do More: Have SAS do the typing for you
Jeanina Worden – PPD
Typing less when you're a SAS® programmer seems counterintuitive; however when repetitive tasks leave you with the realization only five words differentiate the last twenty lines of code, the concept becomes more clear. There are numerous ways to accomplish these tasks, such as “hardcoding” and copy-and-paste, however they carry with them increased risk in terms of additional time required for updates, and the lack of assurance all constraints are accounted for. Therefore the most common “go to” solution is the macro; however that too can quickly result in a hand cramping amount of code. This paper shows how CALL EXECUTE can instead be used to dynamically code repetitive tasks, populating the required constraints from the actual metadata ensuring all available constraints are accounted for and reducing the need for updates if the database changes, and doing it all with less coding. For simplicity purposes PROC PRINT is used in the examples however the code can be changed to perform any function where the repeating code differs by a single dataset or variable name.
Using SASHELP to Validate Attributes
Sadia Abdullah
SASHELP library contains a group of catalogs and files containing metadata information used to control various aspects of a SAS session. Using this valuable information in SAS programming can lead to robust and dynamic code. This presentation lists the different views in SASHELP library and describes what kind of information each one of these views hold. As an example of how the information from SASHELP can be used in day to day programming this presentation will show how SASHELP metadata can be used to validate attributes of an SDTM dataset against its specifications.
Ethnicity and Race: When Your Output Isn’t What You Expected
Philamer Atienza, MS – Alcon Laboratories, Inc.
In SAS, when a classification variable is used to group observations with the same values and a formatted value is used for grouping data, unexpected results may come out of the procedure. If there is more than one unformatted value used for several distinct categorizations but with the same format label, SAS uses the unformatted lowest value to create the output.
Understanding the behavior of SAS when storing the unformatted values will help avoid potential mistakes in using formats and nested classification variables. This paper examines two scenarios when a variable for both ethnicity and race is used in Proc Tabulate to create an output data set: (1) with and (2) without the use of a format.
Assigning a User-defined Macro to a Function Key
Mary Rosenbloom – Edwards Lifesciences, LLC
Kirk Paul Lafler – Software Intelligence Corporation
Are you entering one or more of the same SAS Display Manager System (DMS) commands repeatedly during a session? The DMS offers a convenient way of capturing and saving frequently entered commands in a user-defined macro, and then saving the macro as a function key of your choosing. This paper illustrates the purpose and steps one would use to assign a user-defined macro to a function key.
Best Practices – Clean House to Avoid Hangovers
Mary Rosenbloom – Edwards Lifesciences, LLC
Kirk Paul Lafler – Software Intelligence Corporation
In a production environment, where dozens of programs are run in sequence, often monthly or quarterly, and where logs can span thousands of lines, it’s easy to overlook the small stuff. Maybe a data statement fails to execute, but one already exists in the temp library from a previous program. Maybe a global macro assignment is missed or fails to execute, but a global macro of the same name already exists from a previous program. This can also happen with macros. The list goes on. This paper offers some suggestions for housekeeping steps that can be taken at the end of each SAS program to minimize the chance of a hangover.
Point-and-Click Programming Using SAS® Enterprise Guide®
Kirk Paul Lafler and Mira Shapiro – Software Intelligence Corporation
SAS® Enterprise Guide® empowers organizations with all the capabilities that SAS has to offer. Programmers, business analysts, statisticians and end-users with built-in wizards to perform reporting and analytical tasks, access to multi-platform enterprise data sources, the delivery of data and results to a variety of mediums and outlets, perform data manipulations without the need to learn complex coding constructs, and support data management and documentation requirements. Attendees learn how to use the graphical user interface (GUI) to access tab-delimited and Excel input files; subset and summarize data; join two or more tables together; flexibly export results to HTML, PDF and Excel; and visually manage projects using flowcharts and diagrams.
Basic SAS® PROCedures for Producing Quick Results
Kirk Paul Lafler – Software Intelligence Corporation
As IT professionals, saving time is critical. Delivering timely and quality looking reports and information to management, end users, and customers is essential. The SAS System provides numerous "canned" PROCedures for generating quick results to take care of these needs … and more. Attendees acquire basic insights into the power and flexibility offered by SAS PROCedures using PRINT, FORMS, and SQL to produce detail output; FREQ, MEANS, and UNIVARIATE to summarize and create tabular and statistical output; and DATASETS to manage data libraries. Additional topics include techniques for informing the SAS System which data set to use as input to a procedure, how to subset data using a WHERE statement (or WHERE= data set option), and how to perform BY-group processing.
How to Read Data into SAS® with the DATA Step
Toby Dunn
Kirk Paul Lafler – Software Intelligence Corporation
The SAS® System has all the tools users need to read data from a variety of external sources. This has been, perhaps, one of the most important and powerful features since its introduction in the mid-1970s. The cornerstone of this power begins with the INFILE and INPUT statements, the use of a single- and double-trailing @ sign, and the ability to read data using a predictable form or pattern. This paper will provide insights into the INFILE statement, the various styles of INPUT statements, and provide numerous examples of how data can be read into SAS with the DATA step. We will show how to use the features of the INFILE statement along with the inherent functionality of the DATA step to read not only well formed external files but also the extreme cases such as reading in all files in a directory and how to read data that is scattered over multiple lines.
Quick intro to SQL
Joe Celko
This is a quick overview of SQL for statisticians from an SQL expert who used to do stats.
Potpourri
Reading and Processing Mystery Files
Jimmy DeFoor – Diversant
What are the best coding methods for comparing unknown files with the same layouts? Answer: coding methods which automatically adjust for the number of fields and for different field formats, while also performing the same comparisons regardless of those field formats.
This paper discusses one method of comparing unknown files with the same layouts. It uses SAS macros variables, arrays, and the vcolumn view to efficiently process three Credit Bureau files without knowing the names of those variables.
The technique uses the Call SYMPUT function to create SAS macro variables from the name and type fields in the vcolumn view. Then, it uses a SAS macro to retrieve those macro variables and load them into length statements and SAS arrays. Furthermore, the SAS macro creates new variables that use the old variables as the root for the new variables names, such as Attr46 being used to create Attr46t_tot and Attr46_ck, and then assigns those variables to other SAS arrays in the same relative position as the original variable. This allows the new field to be updated in a do loop when the original field is being investigated by that loop.
The SAS techniques used in this paper include macro variable double resolution, macro variable concatenation, SAS variable concatenation, SAS array processing, user formats, Call SYMPUT and the dictionary content retrieved from the vcolumn view.
The program created by the macros reads the a consolidated bureau file built from the three bureau files, evaluates all character variables from three bureaus for their similarity in content, evaluates all numeric variables for their similarity in content, and then sums the findings for each field.
Staying Relevant in the Ever-Changing Pharmaceutical Industry
Aiming Yang* and Robert Hoffman – Merck & Inc
In recent years, the pharmaceutical industry has undergone dramatic changes. For SAS programming professionals, these challenges include how to adapt to these changes, remaining effective and thus staying relevant in the industry. In this paper the authors share some experiences gained in some large pharmaceutical companies. The major thoughts shared in this paper include the following: solid, up-to-date, diverse SAS programming skills and a good understanding of statistics and clinical trials are required skills. In this changing environment, these skills still matter since it is how we are defined by the industry. Additionally, being a sensible, good team member is essential for fulfilling our roles and functionality. Finally, the ability to work effectively with cross-functional department personnel and external vendors is a must for experienced programming analysts. These abilities will help us stay relevant amidst the ever-changing processes and trends, and thus define who will stay and thrive within this industry.
Annotate your SG Procedure Graphs
Mekhala Acharya – EMMES Corp
In SAS®9.2, the SAS/Graph SG procedures offered a plethora of new options except the Annotate facility. Often graphs are needed with additional informative text and enhancements which are often data driven. The use of the SAS/Graph ODS Graphics Editor is not recommended for annotation in clinical research, although it can be used effectively to edit and annotate graphs that are created by a wide variety of SAS procedures.
This paper shows the effectiveness of using alternate SAS procedures to annotate. Sample code has been provided to show how a lot of the annotate code of traditional SAS graphs can be replaced using very little code. Code reduction to nearly one-third has been seen when using alternate methods to Annotate.
You Could Be a SAS® Nerd If You . . .
Kirk Paul Lafler – Software Intelligence Corporation
Are you a SAS® nerd? The Wiktionary (a wiki-based Open Content dictionary) definition of "nerd" is a person who has good technical or scientific skills, but is generally introspective or introverted. Another definition is a person who is intelligent but socially and physically awkward. Obviously there are many other definitions for "nerd", many of which are associated with derogatory terms or stereotypes. This presentation intentionally focuses not on the negative descriptions, but on the positive aspects and traits many SAS users possess. So let's see how nerdy you actually are using the mostly unscientific, but fun, "Nerd" detector.
Benefits of sasCommunity.org for JMP® Coders
Charles Edwin Shipp – JMP 2 Consulting
Kirk Paul Lafler – Software Intelligence Corporation
The benefits of sasCommunity.org to SAS® users are available to JMP® users also, but participation has lagged. This is partly due to excellent JMP websites including their discussion groups. Reasons for increased JMP participation on sasCommunity.org are illustrated and discussed, including the interchange of data, statistical and graphics, between SAS and JMP software. The benefit of a community to have your work known and also to help newer users will become increasingly more important as JMP users support sasCommunity.org as its popularity grows.
Connect with SAS® Professionals Around the World with LinkedIn and sasCommunity.org
Charles Edwin Shipp – JMP 2 Consulting
Kirk Paul Lafler – Software Intelligence Corporation
Accelerate your career and professional development with LinkedIn and sasCommunity.org. Establish and manage a professional network of trusted contacts, colleagues and experts. These exciting social networking and collaborative online communities enable users to connect with millions of SAS users worldwide, anytime and anywhere. This presentation explores how to create a LinkedIn profile and social networking content, develop a professional network of friends and colleagues, join special-interest groups, access a Wiki-based web site where anyone can add or change content on any page on the web site, share biographical information between both communities using a built-in widget, exchange ideas in Bloggers Corner, view scheduled and unscheduled events, use a built-in search facility to search for desired wiki-content, collaborate on projects and file sharing, read and respond to specific forum topics, and more.
CDISC ADaM Application: One-Record-Per-Subject Data That Doesn't Belong in ADSL
Sandra Minjoe – Octagon Research Solutions, Inc.
It can be tempting to push a lot of analysis data into ADSL because of its simple and convenient one-record-per-subject structure. However, ADSL was designed to hold only information used in other analysis datasets, such as population flags, treatment variables, and basic demographics. So where should all the other one-record-per-subject information, such as date of disease progression or total amount of study drug received, go? This paper and presentation will show examples, weigh the pros/cons of different dataset structure options, and help attendees answer this question for their own data.
A Well Designed Process and QC Tool for Integrated Summary of Safety Reports
Chen, H. – Merck Sharp & Dohme Corp., Rahway, NJ
The ISS (Integrated Summary of Safety) is a critical component in submissions for drug approvals in the pharmaceutical industry. This report consists of multiple reports from clinical studies that focus on drug safety and are generally programmed in SAS. ISS uses the relevant data from one or more clinical studies to generate the tables and figures from integrated data. Various methods are used to verify the results found on the ISS reports. One method focuses on whether the ISS analysis results from the integrated data are consistent with the results from each of the individual studies. This paper introduces a well-designed process and validation tool to ensure the output consistency and integrity of the ISS reports with the individual underlying studies.
What’s Hot, What’s Not – Skills for SAS® Professionals
Lafler, Kirk Paul -Software Intelligence Corporation
Shipp, Charles Edwin -JMP 2 Consulting, Inc.
As a new generation of SAS® user emerges, current and prior generations of users have an extensive array of procedures, programming tools, approaches and techniques to choose from. This presentation identifies and explores the areas that are hot and not-so-hot in the world of the professional SAS user. Topics include Enterprise Guide, PROC SQL, PROC REPORT, PROC FORMAT, Macro Language, ODS, DATA step programming techniques such as arrays and hashing, sasCommunity.org®, LexJansen.com, JMP®, SAS/GRAPH®, SAS/STAT®, and SAS/AF®.
Consulting: Critical Success Factors
Lafler, Kirk Paul -Software Intelligence Corporation
Shipp, Charles Edwin -JMP 2 Consulting, Inc.
The Internet age has changed the way many companies, and individuals, do business – as well as the type of consultant that is needed. The consultants of today and tomorrow will require different skills than the consultants of yesterday. Today's consultant may just as likely have graduated with an MBA degree as with a technical degree. As hired advisers to a company, a consultant often tackles a wide variety of business and technical problems and provides solutions for their clients. In many cases a consultant chooses this path as an attractive career alternative after toiling in industry, government and/or academia for a number of years. This presentation describes the consulting industry from the perspective of the different types of organizations (e.g., elite, Big Five accounting firms, boutique, IT, and independent) that they comprise. Specific attention will be given to the critical success factors needed by today's and tomorrow's consultant.
Top Ten SAS® Sites for Programmers: A Review
Lafler, Kirk Paul -Software Intelligence Corporation
Shipp, Charles Edwin -JMP 2 Consulting, Inc.
We review the top ten SAS® sites for coders, beginning with sas.com and jmp.com. We then expand to sasCommunity.org, support.sas.com, and six other popular sites that assist you in training and programming. If you use Google to search for SAS Web sites, you will get over a million hits. In this paper, we present the results from an unscientific, but interesting, survey and analysis we conducted about the SAS sites visited by those who answered our survey. From nearly 400 invited to respond, more than 60 SAS users shared their insights, along with comments, for 65 SAS-related websites. Finally, we narrow the list down to ten.
Reporting & Statistics
An Introductory Look at the Situational Context Inherent in the RBI Using Two Modeling Approaches
Ryan Sides – Baylor University
The RBI (run batted in) is a popular statistic in Major League Baseball that is extremely dependent on the situational context (i.e., which bases are occupied by runners along with the number of outs in an inning) experienced by the hitter. This presentation offers insight into how much this situational context affects the RBI, providing two related modeling approaches that account for this information and, thus, an approach for improving a player’s evaluation. The first model used to accomplish this goal is a standard multiple regression model, while the second is an intuitive approach based on years of experience as a player by the author. Various statistical tools are utilized to check assumptions and to compare the models; further, the resulting statistics are compared to those frequently used in baseball. A discussion of the use of SAS to do this modeling and analysis along with a demonstration of the developed GUI is included.
Prediction of Diabetes
Repalli Pardha Saradhi – Oklahoma State University
The main purpose of this paper is to forecast how likely the people with different age groups(young age, middle age, older age) may be affected by diabetes based on their daily activities and food habits.
To predict whether the individual is affected with diabetes or not.
If the individual is diabetic then what are the different factors affecting three different segments.
The statistical technique used in this paper is Segmentation and Cluster analysis.
The main goal of this presentation is to prepare a customized list of food items to eat so that it would be useful in avoiding diabetes.
Kass adjustments in decision trees on Binary vs. continuous
Immadi, Manoj Kumar – Oklahoma State University
The paper will explain how the split search algorithm works and how the Kass adjustment will be made in order to maximize the independence between the two branches after the split.
It is observed that Kass adjustments will always improve the independence between the two branches but there has been no proper evidence that how Kass adjustments will work in case of binary vs. continuous target variable. After explaining how Kass adjustments will be made my goal is to compare the Kass adjustments advantages on interval target variable and continuous target variable.
Statistical comparison of relative proportions of bacterial genetic sequences
Jose F. Garcia-Mazcorro, Jan S. Suchodolski, Joerg M. Steiner, and Bradley J. Barney – – Texas A&M University
The intestinal tract is inhabited by hundreds of different types of bacteria, which have the potential of enhancing health or disease in the host. Several current technologies are capable of identifying these bacteria by determining the order of nucleotides (sequencing) in their DNA sequence with an unprecedented coverage. These technologies can provide two types of data sets: 1) the raw genetic sequences (not discussed here), and 2) the relative proportions of sequences, which are calculated by dividing the number of sequences obtained from a given bacterial group by the total number of sequences obtained. This dependent variable (relative proportions of sequences) is continuous but constrained between 0 and 100%, and has a nested architecture (bacterial species within a genus within a Family within an Order within a Class within a Phylum). I discuss different alternatives (both parametric and non-parametric) to analyze this data set, with emphasis on the use of SAS 9.2. PROC MIXED can be used but skewed residuals are commonly encountered (data is usually not normally distributed). PROC GLIMMIX with a beta distribution can also be used; however, the beta distribution assumes that the total proportion of 100 is divided between two groups. The Dirichlet distribution is a generalization of the beta distribution that allows a proportion to be divided between two or more groups, but SAS does not currently provide this option. Future analyzes are needed and ongoing to empirically determine the most appropriate statistical method to compare relative proportions of bacterial genetic sequences.
Use of Decision, Cut-off and SAS code node in SAS Enterprise Miner while scoring to adjust prior probabilities and prediction cutoff for separate sampling
Yogen Shah – Oklahoma State University
It is common practice to use sample whose primary outcome proportion is different than that of actual proportion in the population, while building predictive models for binary target variable. This kind of separate sampling or balanced sampling works effectively when ration of primary outcome to secondary outcome is very small.
Building predictive model from such balanced sample gives various advantages like reduced bias to particular sample outcome case and improved performance. Model fit statistics & assessment analysis plots are very much related to outcome proportion in training & validation dataset. Therefore resulting model cannot predict well while scoring the score data set because outcome primary proportion in scoring data set is similar to the population but different than balanced sample.
This presentation illustrates the effective use of Decision, Cut-off and SAS code node in SAS EM to resolve above problem. Decision node specifies actual proportion in population aka prior probabilities while drawing sample dataset for model building. SAS, by default uses cut-off value of 0.5 while predicting binary outcome from predicted probabilities which means that chance of primary outcome is same as secondary outcome. But this is not true from the fact that proportion of primary outcome in population is very small. SAS provides cut-off node to adjust this cut-off value based on model’s ability to predict true positive, false positive & true negative. We need to add specific code under “score” section of SAS code node to account for the cut-off value change in scoring dataset as well.
A SAS Macro Tool for Selecting Differentially Expressed Genes in Microarray Data
Huanying Qin *, Laia Alsina, Hui Xu,
Elisa Priest – Baylor Health Care System
DNA Microarrays measure the expression of thousands of genes simultaneously. Commercial software such as JMP®/Genomics and GeneSpring® use T-tests, ANOVA, or mixed models for statistical analysis to identify differentially expressed genes. These methods are valid for larger sample sizes. We work with an immunology laboratory that often needs to analyze data from experiments with less than 10 samples. The researchers developed an Excel-based process to select differentially expressed genes from microarray experiments with a small sample size. This process required complex, manual manipulation of data and could take weeks to complete. We created a SAS MACRO to automate the whole process. The program reads microarray data from and provides a summary report in Excel. The researchers can easily modify the parameters and repeat the analysis. The program made it possible to reduce data processing time from weeks to minutes with no mistakes related to manual manipulation. In addition, it provides more output information for further analysis. This paper describes the tool and uses real data to demonstrate that it is valid and efficient.
When It's Not Random Chance: Creating Propensity Scores Using SAS EG
Josie Brunner – Austin Independent School District
While randomized samples are ideal for hypothesis testing, they are not always possible, especially when evaluating programs in which participants select themselves into the treatment or control group. One quasi-experimental design approach is to use propensity scores to match treatment and control units to reduce selection bias in observable pre-treatment characteristics. This presentation will focus on why and when propensity score analysis (PSA) should be included in a research design and will demonstrate how a propensity score can be created very simply using SAS EG 4.3.
A Curriculum for Training Statisticians to Program
Cano, Stephanie L * – University of Texas at San Antonio Dept of Management Science and Statistics
As the demand for applied statisticians increases across the nation, the ability for these statisticians to write effective code for data manipulations becomes increasingly prerequisite. Many applied statistics programs use SAS or other statistical packages in the classroom, but not as many have curriculum coursework specifically designed to develop strong programming skills. At UTSA a two semester sequence of courses was implemented with the goal of graduating statisticians with strong SAS programming skills.
The first course focuses on general programming and data management techniques and the second focuses on statistical applications using real and "difficult" data. Projects involve complex manipulations for reading, cleaning and processing data for analysis. Coursework emphasizes the importance of getting a complete understanding of data before analysis, to ensure that both an appropriate analysis is performed and results are correctly interpreted.
This presentation will include a discussion of textbooks, course materials and other issues encountered in the teaching of these courses.
Proc Surveyfreq: Why Do a Three Way Table in SAS When We Want Two Way Table Information?
Mehta, Hemalkumar B.* and Johnson, Michael L. – Department of Clinical Sciences and Administration, College of Pharmacy, University of Houston
A Proc Surveyfreq procedure in SAS® has an advantage over Proc Freq in that it incorporates multi-stage probability sampling design into the analyses. Several nationally representative data have multi-stage probability sampling design. Most of the time we need two way table information for the group of our interest, eg: patients with certain disease. There are two ways to get group specific results in Proc Surveyfreq: (i) use “by statement” (ii) do “three way tabulation”. “By statement” will provide group specific results but it will not give valid domain analysis and it will not preserve the sampling design. Hence, the results will not be generalizable to the population level. “Three way tables” will provide group specific results with valid domain analysis while preserving the sampling design. In the current paper, using Medical Expenditure Panel Survey (MEPS) data, we show that three way tables should be used when we need two way table information primarily for valid domain analysis and extrapolating results at population level. This paper can serve as a guide to researchers who deal with single stage or multi-stage probability survey data which uses clustering, stratification and weighting.
Advice to Health Services Researchers: Be Cautious Using the “Where” Statement in SAS Programs for Nationally Representative Complex Survey Data
Mehta, Hemalkumar B.* and Johnson, Michael L. – Department of Clinical Sciences and Administration, College of Pharmacy, University of Houston
Health services researchers often conduct research with nationally representative survey data where participants or patients are not sampled randomly but sampled using complex stratified multistage probability designs. Such datasets include cluster, strata and weight information which are essential for extrapolation of results to a national level. Several Proc Survey procedures are available in SAS® 9.2 which enables analysis of such data while preserving the complex sampling design and extrapolation of results. The first step researchers often perform is selection of a population of interest, i.e. selection of participants with certain inclusion criteria, from the main dataset. This can be accomplished in SAS® using the “where” statement in data steps. However, use of the where statement for selecting a population of interest can defeat the purpose of the sampling design of such data and limits researcher’s ability to generalize results. In the current paper using Medical Expenditure Panel Survey (MEPS) data, a nationally representative multistage probability survey data, we show how to analyze such data while preserving sampling design and not using the where statement. The principles and techniques explained in this paper can be extended to any other disciplines where the researcher has to deal with complex survey data which involves cluster, strata and weight information in sampling design of the data.
A Simulation Study for Power Analysis in a Longitudinal Study Using SAS
Fenghsiu Su, MSBA, Ravi Lingineni, Philamer Atienza, MS, Subash Aryal, PhD, Karan P. Singh, PhD, Sejong Bae, PhD – University of North Texas Health Science Center
Background: In a longitudinal study, it is unlikely that every subject’s information can be obtained at each time point. To study the incomplete data across time and the influence of subjects on their repeated observations as a random effect, mixed-effect regression models (MRMs) can be used. The purpose of this research is to study the power characteristics of the likelihood ratio test for hierarchical correlated data.
Methods: We conducted a simulation study based upon 10,000 replicates of data. The MRM is constructed for different sample sizes in 3 situations: (1) with five time points as a fixed factor and the intercept and trend as random variables, (2) with various time points and fixed variances for large and small sample sizes, and (3) for correlation between time points. By using the likelihood ratio test, we determined an appropriate model to estimate the parameters. Data were based on a random normal distribution with mean 0 and a pre-specified error variance. To simulate realistic missing data, we assumed a 20% drop out rate.
Results: Fixing factors constant (other than the parameter of interest) in the scenarios above, we observed that the power increases: (1) as the number of time points increases, (2) as the sample size increases, (3) as the variance decreases, and (4) as the correlation between the time points decreases. All results were consistent with previous studies on statistical power characteristics.
Evaluation of Promotional Campaigns in the Utilities Industry Using a Transfer Function Time-Series Model
Wen, Fujiang – City of Dallas Water utilities
Promotional campaigns are often used by the utilities industry to increase total sales level of their products or services. Evaluation of the effectiveness of campaigns is a key issue for the utilities to effectively use their resources because the campaigns normally require expenses. A transfer function for the time-series model is applied for an analysis of the direct bill insert advertising campaign to promote a new toilet replacement program for Dallas Water Utilities customers. The analysis was based on the numbers of customers who participated in the program from August, 2007 to September, 2010. A point intervention function was used to indicate three times of the advertising campaigns, with the model to quantify the promotional effect. As a result, an exponential transfer function was identified to describe the effect. The study shows that, after a promotional campaign, the numbers of participating customers are significantly increased, and then quickly shrinks with an exponential decreasing trend. The findings can be used to forecast the future demand with proposed promotional campaigns.
Introduction to Multi-Level Modeling
Brandon Vaughn – Dell
SAS PROC MIXED is a flexible program suitable for fitting multilevel models. Since the program was developed from the perspective of a “mixed” statistical model with both random and fixed effects, its syntax and programming may be unfamiliar with those wishing to fit a multilevel or hierarchical model. The purpose of this presentation is to introduce the concept of multilevel models, and demonstrate the use of SAS PROC MIXED to fit such models.
Using Proc Logistic, SAS Macros and ODS Output to evaluate the consistency of independent variables during the development of logistic regression models. An example from the retail banking industry
Vidras, Alexandros * and
Tysinger, David – Merkle Inc
Predictive models are used extensively in customer relationship management analytics and data mining to increase the effectiveness of marketing campaigns. Logistic regression remains at the forefront in analytics as the most popular technique used to predict customer behavior. Particularly with direct mail marketing, logistic regression models are built using previous campaigns that span several months in length, posing a major challenge to statisticians to devise a way to not only capture seasonality across these campaigns but to also evaluate the stability of these models. Millions of dollars are spent annually on marketing activities that utilize logistic regression models. Therefore the predictive ability and robustness of logistic models is essential for executing a successful direct mail campaign. This paper shows how Proc Logistic, ODS Output and SAS macros can be used to proactively identify structures in the input data that may affect the stability of logistic regression models and allow for well-informed preemptive adjustments when necessary. Thus we are introducing a standardized process that industry analysts can use to formally evaluate the impact and statistical significance for predictors within logistic regression models across multiple campaigns and forecasting cycles.
Putting Analytics into Business Context: The Analytics Value Chain
Jain Piyanka – PayPal, eBay Inc.
In a product/services company, analytics generates its greatest value when a certain line-up of best practices is performed, ranging from gross intelligence to a more detailed understanding. This is achieved with a "three pillar" analytical approach: [Measurement Framework, Portfolio Analysis, and Customer Analysis]. Within each of these components, we move from a simpler "20,000 foot" view analysis, to deeper, more comprehensive analytics.
In this talk, Piyanka Jain will cover these components in detail, along with the tools and techniques required and gotchas to look out for. Auxiliary intelligence such as VOC (Voice of the Customer) and Competitor/Industry/Economic landscape analysis, which delivers an [outside-in] view of the business, will also be covered.
What you will walk away with is:
1. An understanding of the [analytics value chain], which ets predictive analytics into an impactful context
2. Analytics your organization needs, to better understand your business
3. Tools and methodologies best suited for the [three pillars] of analysis
4. Challenges to prepare for, as you embark on these analyses
5. Organizational support needed for analytics execution.
Leave a Reply