Greg Stanek |
Nesting Multiple Box Plots and Blockplots using GTL and Lattice Overlay
The objective was to provide a summary table and graph for several quality improvement measures on a single page to allow leadership to monitor the performance of care over time. The challenges were to integrate multiple SAS procedures to generate the plots within one page and whether to use box plots or series plots of means or medians.The solution was developed by using Graphics Template Language (GTL) and the SGRENDER Procedure. For each measure we used the BOXPLOTPARM statement to display a series of box plots and the BLOCKPLOT statement for a summary table and then used the LAYOUT OVERLAY statement to combine the box plots and summary tables on one page. The results display a summary table (BLOCKPLOT) above each box plot series for each measure on a single page. Within each box plot series there is an overlay of a system level benchmark value and a series line connecting the median values of each box plot. The BLOCKPLOT contains descriptive statistics per time period illustrated in the associated box plot. The discussion points will focus on techniques for nesting the lattice overlay with box plots and BLOCKPLOTS in GTL and some reasons for choosing box plots versus series plots of medians or means.
|
Julie VanBuskirk |
Picture Perfect Graphing with Statistical Graphics Procedures
Do you have reports based on SAS/GRAPH procedures, customized with multiple GOPTIONS? Do you dream of those same graphs existing in a GOPTIONS and Annotate free world? Recreating complex graphs using Statistical Graphics (SG) procedures is not only possible, but much easier than you think! Using before and after examples, I will discuss how the graphs were created using the combination of Proc Template, Graph Template Language (GTL), and the SG procedures. This method produces graphs that are nearly indistinguishable from the original, with code that has been proven to be reusable across projects, and is based on one central style template which allows style changes to cascade effortlessly.
Automating the Creation of Complex PowerPoints The creation of production reports for our organization has historically been a labor-intensive process. Each month our team produced around 650 SAS graphs and 30 tables which were then copied and pasted into 16 custom PowerPoint presentations, each between 20 and 30 pages. With stored processes and the SAS Add-In for Microsoft Office (AMO), simply refresh those 16 PowerPoint presentations by using AMO to run SAS stored processes. The stored processes generate the graphs and tables while AMO refreshes the document with updated graphs already sized and positioned on the slides just as we need them. With this new process we are realizing the dream of reducing the amount of time spent on a single monthly production process. |
Elisa Priest |
Keep it Organized: A Grad Student "How To" Paper Grad students learn the basics of SAS programming in class or on their own. The classroom lessons focus on statistical procedures and the datasets are usually ready for analysis. However, independent research projects may have data organized in many complex structures and may require many different SAS programs for data organization, exploration, and analysis. At the end of a project, you generally end up with multiple SAS programs and data sets. During the project, students may not realize the need to organize and document their project. However, it may become painfully clear if modifications need to be made to the completed analysis.
This paper will provide students with the tools that can help them survive the challenges of organizing and a documenting SAS code for a research project. I will present examples to help students keep their SAS code and data sets organized using base SAS as well as Enterprise Guide 5.1. The primary topics to be covered will include: documentation of SAS code, using comments, program headers, tables of contents, organization of SAS files, and organization of Enterprise Guide projects. This will allow the student researcher to remember what they did, how they did it, and where the results are. This diligence will be a critical timesaver in the final revisions of a research project or dissertation |
Robert Wickham | A SAS MACRO for estimating bootstrapped confidence intervals in dyadic regression models. The actor-partner interdependence model (APIM; Kenny, Kashy, & Cook, 2006) is a popular procedure for analyzing dyadic data (e.g., married couples, twin-siblings). Recent work (Kenny & Ledermann, 2010; Wickham & Knee, 2012) demonstrates that the ratios of regression coefficients generated by the APIM are often of substantive interest to researchers. Unfortunately, the sampling distributions for these ratios are non-normal, which renders standard parametric tests of significance (e.g., t-test) unreliable. This talk presents a SAS MACRO for performing a non-parametric bootstrapping procedure that will provide unbiased confidence intervals for several of the key coefficient ratios estimated by the APIM. Extensions to this general model are discussed. |
Gabriela Cantu |
Dirty Data? Clean it up with SAS.
Clinical trials data can be complex and integrate multiple data elements including demographic, laboratory, clinical, medication, and medical history. Although extremely valuable to the study the completeness and cleanliness of clinical trials data is often less than ideal. In order to be successful, clinical researchers must strategize methods to maintain data integrity and cleanliness. This presentation will focus on planning for and performing clinical trials data edit checks, cleaning and documentation.Through a comprehensive planning process and a series of simple SAS® procedures, dirty data can be transformed into usable and clinically informative datasets. A simple ARRAY can be used to reassign tricky variables into more useful formats. Utilization of PROC UNIVARIATE to produce continuous variable statistics allows researchers to identify out of range and unexpected values for clinical data. Additionally, PROC FREQ will allow researchers to check for inappropriate or incorrect values for categorical variables. Through a series of MACROS, the clinical research is able to execute these SAS procedures with minimal key strokes and repetition. Accurate documentation of data cleaning and variable verification can be obtained by utilizing the SAS log and ODS statements. SAS provides the clinical researchers real time documentation of both data cleaning procedures and results.
|
Lu Gan |
Using SAS to locate and rename external files
When reading in external data into SAS, a program needs to specify the data file location and file name in the related importing statements. If the raw data file is named with a timestamp, the import program will require an extra step which is updating the file name every time before it?s run. This will be a hassle if the import is needed on a routine schedule and/or there are multiple external data files. Therefore, a SAS program can be written to eliminate the repeating manual update. This approach includes two steps, first searching for the latest data file in the specific file directory, then renaming it by removing the specific timestamp. After execution of the above steps, the import program will be able to read in the data files directly.The examples and syntax illustrated are in SAS 9.2. This paper assumes that the reader has a basic understanding of DATA step programming and the macro language. |
Jose Garcia-Mazcorro | Use and Applications of JMP® in Microbial Ecology
Microbial ecology is the ecology of microorganisms; that is, the study of the relationship among microorganisms and between microorganisms and their environment. Millions of dollars and countless hours are currently spent by enthusiast scientists, microbiologists and bioinformaticians from all over the world to elucidate the characteristics and biological mechanisms behind the ecology of microorganisms. |
Akkarapol Sa-ngasoongsor | An analysis of customer preference of automobile products using SAS.
This paper presents an analysis of customer preference of automobile products in terms of Willingness-To-Pay (WTP) of vehicle attributes using SAS®, including customer segmentation based on WTP of Body Type (BT) attribute, and dimensionality reduction of WTP vector. This analysis addresses issues related to the preparation of customer inputs for effective prediction of customer preferences. Our analysis shows that respondents show maximum variability in specifying their WTP for BT attribute and hence classification based on BT associations is the most suitable way to differentiate between respondents from available data. In case of dimensionality reduction of WTP vector, we propose a twofold criteria for WTP rankings. The first addresses the importance of WTP for an attribute level from a respondent?s utility maximization standpoint. The second criteria addresses the predictability of WTP from the available data using Signal-to-Noise Ratio (SNR). The combination of both criteria provides excellent insights to the analyst regarding importance and predictability of WTP data, and also provides a tractable way to the dynamic modeling of customer preferences. |
Paulina Kulesz | MULTI-PANEL SCATTER PLOTS AND SCATTER PLOT MATRICES
A scatterplot is one of the most common tools utilized in the visual exploration of data. The scatterplot aids researchers in examination of relations between two variables X and Y, as well as reveals degree of symmetry, concentration of data, and possible outliers. A bivariate scatterplot is the simplest form of representing a relation between two variables. More advanced forms such as panel scatter plots and scatterplot matrices allow representing degree of dependence between multiple variables (taken two at the time) in a comparative way. |
Peter Eberhardt |
The SAS® DATA Step: Where Your Input Matters Before the warehouse is stocked, before the stats are computed and the reports
A Cup of Coffee and PROC FCMP: I Cannot Function Without Them How much grief have you put yourself through trying to create macro functions If any of these statements describe you, then the new features of PROC FCMP are |
Chris Schacherer |
SAS® Data Management Techniques: Cleaning and transforming data for delivery of analytic datasets
The SAS® Programmer's Guide to XML and Web Services Extensible Markup Language (XML) provides a flexible means of organizing and transmitting data for consumption by both humans and computers. As such, it has come to be widely used in rendering data within web applications as well as for data transmission via the internet. Because of XML's growing role in data interchange, it is increasingly important for SAS programmers to become familiar with SAS technologies and techniques for creating XML output, importing data from XML files, and interacting with web services?which commonly use XML file structures for transmission of data requests and responses. The current work provides detailed examples of techniques you can use to perform these tasks using XML Mapper®, the XML LIBNAME engine, Output Delivery System®, the FILENAME statement, and new SOAP functions available beginning in SAS 9.3. |
Charles Minard |
Overlaying Scatter Plots on Box Plots in the Presence of Ties Graphical methods are important for efficient and effective visualization of data. Many graphical methods are available, and combining different types of graphics can yield interesting and effective results. Box plots are commonly used to compare the distribution of continuously measured variables across two or more groups, and overlaying a scatter plot on a boxplot can provide additional information about distributional differences or similarities. However, ties may be common when the response variable is not measured with high fidelity such as when time is measured in days as an integer value. Random jitter could be used to create separation between points when ties are present in the response variable. However, this can create a figure that is distorted and somewhat difficult to interpret. Evenly distributing ties across the horizontal scale creates a figure that is clearer and more informative. The purpose of this paper is to present SAS code for generating a scatter plot overlaid on a box plot in the presence of ties. The data set requirements to generate this figure are discussed, and the SAS procedures TEMPLATE and SGRENDER are used to produce the final graphic. A data set comparing the number of days from hospital discharge to readmission stratified by groups of patients is used as an example throughout. |
Daniel Sakya |
HASH programming is a hot topic in the industry that started with SAS 9. This paper is intended to provide more exposure to novice or experienced SAS programmers that are looking for alternatives to data step programming. The concept of HASH programming is similar to the definition of an array in SAS. Several SAS users have benefitted from HASH programming by considerably reducing the processing time for compound data merging tasks. This paper will entail the introduction to HASH programming, its syntax, a few examples, and last but not least, the benefits on using HASH programming versus regular data step. The examples and syntax illustrated are in SAS 9.2. |
Fujiang Wen |
Intervention Analysis of Water Consumption for Utilities Using Different Time-Series Models In December, 2011, mandatory watering restrictions on outdoor water use were implemented to reduce total consumption in the city of Dallas, which limited the use of sprinkler systems to twice weekly for the City?s customers. Months later, the restrictions became a permanent water conservation measure. An intervention analysis was used to estimate the impact of the restrictions on water consumption using time-series models which include a transfer function in ARIMA models and time-varying regression effects in the RANDOMREG statement of Unobserved Components Models (UCM). The study indicated that Proc ARIMA provides flexibility to easily model transfer functions for intervention effects, and the UCM model is a convenient way of additively decomposing a time series into the components of trend, season, cycle, and irregular movements. Both procedures were demonstrated through a few of different steps, and simultaneous effects of environmental factors were also captured by the models. |
Steve Fleming |
Enterprise Guide for New and Experienced SAS Users SAS Enterprise Guide (EG) has revolutionized the way SAS is used to process, model, and visualize data. This presentation will look at the advantages and disadvantages of EG from the perspective of both a user new to the SAS environment and an experienced SAS programmer. We will cover tips for using EG more efficiently, how EG leads to better project documentation, and when to not use EG. Data used to illustrate the application of EG will be taken from actual educational research projects. |
Jingjing Qu |
Net Present Value Model Approach A special feature of an insurance product is that its sale generates a sequence of premium payments up to a future time. Unlike the sale of consumption goods, the revenue of an insurance product at the time of sale has to include the assessment of the future potential premium payments. Can an effective model predict beyond the initial response a sufficient degree that would positively impact revenue? This paper presents the business value as well as the methodology of Net Present Value (NPV) models. The paper covers the data requirements and statistical procedures, as well as the benefit of NPV models compared to a response model or a RIP (response/issue/paid) model. The statistical methods presented are from a technical perspective and concentrate on developing a NPV model by using SAS. |
Harjanto Djunaidi |
Predicting Students Enrollment Using SAS Data driven strategic decisions have been utilized and used many years in other areas than in education. For example, most financial or manufacturing companies have used simulation, mathematical programming and statistical approaches to improve their ability to make sound strategic investment and resource allocation decisions. Recent dynamic changes in the US and global economy as well as in the competitive environments where higher education institutions are operating have forced colleges and universities in the US to find ways to optimize their services to the students given the new realities. One of possible applications of statistical analyses which can be used by these institutions is to use logistic regression model to predict student yield. This new approach helps to max out the yield more efficiently and guide the Office of Students Recruitments to predict how many students who were offered admissions will accept and attend. The logistic regression model generates the probability which can be used in the decision making process. Modeling a specific student group that has relatively close traits will certainly improve the ability to predict candidates? probability of accepting admissions offer. As results, decision makers might be able to allocate resources efficiently and make better strategic decisions on financial award, class room and course management as well as dorm assignment. Decision makers will also be able to better plan on the number of students should that should be put on the wait list and how many of the applicants will be sent a rejection letter.
Institutional Research Intelligent: Go Beyond Reporting Higher education institutions in the US are required to complete various reports and submit different schools? related data to the government through IPEDS. Schools' related information such as students? enrollment; financial aids, graduation and retention rate are among several reports that need to be completed by each degree granting institution which have received some sort of government?s funding/support. These data sets content very valuable information which can be used by the decision makers to make strategic planning or decisions that enable them to out-smart their competitors or to increase program efficiency. Majority of colleges and universities, though have gone through and completed data gathering process fall short of utilizing the information and use it to improve their strategic decisions. With increasing competition in the industry, decreasing funding received from the state, the federal agencies and alumni; and increasing college tuition have caused higher education institutions with no choice, but to make data-driven strategic decisions. For example, colleges and universities might be able to apply statistical approaches such as categorical or/and multivariate to predict students? enrollment; make strategic decisions on financial awards, market penetration and others. Unless this new mindset is embedded in the decision process, higher educations in the US may continue to downsize their programs due to a smaller education budget. |
Sandra Minjoe |
POSTER: ADaM Implications of the CDER Data Standards Common Issues Document Over the past few years, the United States Food and Drug Administration (US FDA), specifically the Center for Drug Evaluation and Research (CDER), has been receiving more data from sponsors in a Clinical Data Interchange Standards Consortium (CDISC) or CDISC-like structure. Reviewers have had tools built and received training, but there are some technical issues with many submissions that are hindering their review process and thus their full adoption of CDISC. This prompted the issuance of a document entitled ?CDER Common Data Standards Issues Document?.
Working closely with CDER to address many of these issues, the CDISC Submission Data Standards (SDS) team created an Amendment to the Study Data Tabulation Model (SDTM) version 1.2 and the SDTM Implementation Guide (IG) version 3.1.2. The CDISC Analysis Data Model (ADaM) team has not created a corresponding amendment, but there are many issues noted in the CDER document that have implications on ADaM. This poster examines issues that could affect ADaM, and describes how to handle them so that data and supporting documents submitted to FDA CDER are reviewer-friendly.
|
Kirk Lafler |
An Introduction to SAS® Hash Programming Techniques
Beginning in Version 9, SAS software supports a DATA step programming technique known as hash that enables faster table lookup, search, merge/join, and sort operations. This presentation introduces what a hash object is, how it works, and the syntax required. Essential programming techniques will be illustrated to define a simple key, sort data, search memory-resident data using a simple key, match-merge (or join) two data sets, handle and resolve collision scenarios where two distinct pieces of data have the same hash value, as well as more complex programming techniques that use a composite key to search for multiple values. Google® Search Tips and Techniques for SAS® and JMP® Users Google (www.google.com) is the worlds most popular and widely-used search engine. As the premier search tool on the Internet today, SAS® and JMP® users frequently need to identify and locate SAS and JMP content wherever and in whatever form it resides. This paper provides insights into how Google works and illustrates numerous search tips and techniques for searching articles of interest, including reference works, information tools, directories, PDFs, images, current news stories, user groups, and much more to get the best search results quickly and easily. SAS® Programming Tips, Tricks and Techniques The base-SAS® System offers users with a comprehensive DATA step programming language, an assortment of powerful PROCs, a macro language that extends the capabilities of the SAS System, and user-friendly interfaces including SAS Display Manager and Enterprise Guide. This presentation explores a collection of proven tips, tricks and techniques related to effectively using the SAS System and its many features. Attendees will examine keyboard shortcuts to aid in improved productivity; the use of subroutines and copy libraries to standardize and manage code inventories; data summarization techniques; the application of simple reusable coding techniques using the macro language; troubleshooting and code debugging techniques; along with other topics. Exploring DICTIONARY Tables and SASHELP Views SAS® users can quickly access useful information about their SAS session with a number of read-only SAS data sets called DICTIONARY tables or SASHELP views. During a SAS session, information (or metadata) about system options, librefs, table names, column names and attributes, formats, indexes, and more can be accessed. This presentation explores DICTIONARY tables and SASHELP views, how they are accessed, what type of information is available, and their application using real-world scenarios. Output Delivery System (ODS) ? Simply the Basics Are you looking for ways to improve or enhance the way your SAS® output appears? Output Delivery System (ODS) can help turn tired-looking output into great looking information with a purpose. Gone are the days when the only available formatting choice is boring output listings containing lifeless monospace fonts. ODS introduces exciting features for your output. Using built-in format engines, ODS provides SAS users with a powerhouse of exciting capabilities to produce ?quality? and publishable output. This presentation emphasizes an introduction to ODS including basic operation and syntax; how specialized output can be created including RTF, PDF, MS-Excel spreadsheets, SAS data sets, and HTML; and how selection (or exclusion) lists can be constructed to handle basic content customizations. Basic SAS® PROCedures for Producing Quick Results As IT professionals, saving time is critical. Delivering timely and quality looking reports and information to management, end users, and customers is essential. The SAS System provides numerous "canned" PROCedures for generating quick results to take care of these needs … and more. Attendees acquire basic insights into the power and flexibility offered by SAS PROCedures using PRINT, FORMS, and SQL to produce detail output; FREQ, MEANS, and UNIVARIATE to summarize and create tabular and statistical output; and DATASETS to manage data libraries. Additional topics include techniques for informing the SAS System which data set to use as input to a procedure, how to subset data using a WHERE statement (or WHERE= data set option), and how to perform BY-group processing. |
Rob Caudill |
JavaScript menu that submits SAS/IntrNet Report code
The Texas Education Agency, TEA, like many organizations uses SAS/IntrNet to deliver dynamic web reports to their customers. The TEA's Office of School Finance is responsible for administering the Foundation School Program (FSP) and for producing reports and other data related to the FSP. School Finance?s primary report is the Summary of Finance, SOF, which is available via SAS/IntrNet in ten separate district level reports covering four years and up to three versions per year and in four state level reports. The fourteen SOF?s take up a significant amount of vertical space on the web page they are on because there is a separate district selection/submit box for each SOF positioned down the page. This paper would be about a JavaScript based menu system that provides access to the SOF reports in less space by using customer?s selections to return the requested report to the customer?s computer screen.
The Texas Education Agency, TEA, like many other organizations has two web reporting websites for dynamic content: a test website that is available behind the TEA?s firewall and enables the content owner to view the web page before it is put into production, and a production website that is available to the public. Having SAS/IntrNet code that is the same for the test and production web reporting environments makes managing web reports less complex because the data the web report is based becomes the primary variable. |
Toby Dunn |
The Good, The Bad, and The Ugly The SAS® System has all the tools users need to read data from a variety of external sources. This has been, perhaps, one of the most important and powerful features since its introduction in the mid-1970s. The cornerstone of this power begins with the INFILE and INPUT statements, the use of a single- and double-trailing @ sign, and the ability to read data using a predictable form or pattern. This paper provides insights into the INFILE statement, the various styles of INPUT statements, and illustrates numerous examples of how data can be read into SAS with the DATA step. |
Philip Easterling |
The Effective Use of Business Analytics
As the use of analytics continues to increase across all industries, many organizations have analytical projects and initiatives at various levels of maturity. A large percentage of organizations are still at the early stages of figuring out how to use analytics to derive information about customers, markets, competitors, suppliers, employees, and operations in order to better communicate, plan, and execute strategies, allocate funds, and manage staff. Maximum value can be achieved when an organization embraces the use of analytics, combined with effective information management practices and policies, to support a broad fact-based decision-making process within and across organizational boundaries.This presentation will start by defining the role of analytics and business intelligence. The discussion will then explore the organizational challenges to promote the proper use of analytics and the critical role which an analytics “Center of Excellence” can play to support their efforts. The presentation will conclude by providing guiding principles to assist organizations in identifying starting points to apply analytics. |
Hemalkumar Mehta |
Risk adjustment models such as Charlson Comorbidity Score (CCS) and Chronic Disease Score (CDS) are used to control for confounding and predicting outcomes in epidemiologic studies. A traditional statistical measure such as concordance (c) statistics has been used widely in literature for comparison of different risk adjustment models. Recently, new measures have been introduced for comparison of such models; this includes reclassification methods such as net reclassification improvement (NRI) and integrated discrimination improvement (IDI). In the current study, in addition to c-statistics, we show the application of NRI and IDI in comparing risk adjustment models. We compared CCS and CCS + CDS models in predicting one-year mortality in type-II diabetes patients using the Clinical Practice Research Database (CPRD) database. Descriptive statistics were used to describe the cohort and logistic regression models were used to predict mortality. All analyses were adjusted for age and gender. All data manipulation and statistical analyses were performed using SAS 9.1.Results of this study showed that both CCS and CDS were predictive of 1-year mortality (c-statistic: 0.791, 0.803, respectively). The addition of CDS to a model with the CCS score improved c-statistics slightly. The NRI and IDI values were positive demonstrating that the CCS + CDS model performed significantly better compared to CCS model alone. These results suggest that the combined use of the CCS and CDS may be useful to adjust for comorbidity in outcome models of mortality in patients with type – II diabetes.
Enhansing SAS output with Output Delivery System (ODS) This presentation will introduce audience to the HTML, PDF, RTF, MS Excel spreadsheet and other types of SAS outputs. With the help of few examples, the presentation will also show how to add traffic lighting to the output and how to change appearance of titles and footnotes. Various tips, tricks and techniques in handling SAS ODS output will be shown to the attendees. |
Lisa Mendez |
Two Methods to Merge Data onto Every Observation in Another Dataset There are times when you just can’t seem to find a PROC that will do exactly what you want. We came across a scenario where we needed to calculate the mean of a student data file and then flag student observations that were more than three standard deviations from the mean. We came across two methods to do what we needed to do. One method uses a combination of Data steps and Procs, and utilizes the If _N_ then set method. Another method utilizes Proc SQL. This paper will outline both methods step-by-step. This paper illustrates two different ways to do the same thing. Personal preference dictates which method to use. |
Drew Turner |
When was my data last updated? How to automate data monitoring and notification. Knowing when data was last updated can prove invaluable when verifying data integrity. If you have ever spent time troubleshooting a process only to find out that your root problem was outdated data, then you know the importance of keeping track of how recent your data is. Having that information about your data—how much there is and how recently it was updated—awaiting you first thing in the morning increases efficiency. Additionally, having the ability to analyze and compare past results enables you to recognize trends and better predict future activity.
|
Steve Yan |
Application of SAS in Product Testing in Retail Business Testing of new products is a daily task for many retail businesses. A reliable and accurate evaluation of the product is important to the business because both over- and under- assessment can bring a significant loss to the company. However, in practices most tests are done in a relatively small scale due to limitations on resources: spaces, labor and other expenses and the results are affected by many internal and external factors as well as randomness. At Zale we have developed and implemented a SAS application that performs the test in a systematic way, including product segmentation and scoring, store selection, and scoring and forecasting of new products. This paper will briefly describe our methodology, but the focus is on several SAS techniques that makes our approach possible and efficient. Finally, the performance of the method is evaluated by simulations under several scenarios. |
Carl Raish |
ERCOT procures Emergency Response Service (ERS) from providers in Texas using an EXCEL submission form which has several different tab layouts and with data that is not organized in columns. The presentation will give a brief overview of ERS, show how submission forms are input into SAS using the EXCEL libname engine, and then, following processing, written out for return to the submitter using custom built pre-defined EXCEL templates, the EXCEL libname engine, DDE and EXCEL macros. |
Debbie Buck |
SAS® Dates: Facts, Formats, and Functions Among the first features of SAS that users learn is that SAS dates (and times) have unique characteristics. A SAS
Despite SAS dates being part of the initial learning curve, there are a number of factors concerning dates that are In this presentation, we focus on displaying/outputting SAS dates (formats), reading dates (informats), importing/ |