2016 SCSUG Educational Forum
San Antonio, TX
The 2016 SCSUG Educational Forum will be hosting the 4th annual Student Sypmosium in San Antonio, TX.
Analytics is becoming a major tool for generating maximum value from data and supporting business decisions. The education and training of students in the methodology and software application is critical in filling the demand for such analytical expertise. This technical training must be accompanied by opportunities to enhance the soft skills as well.
Louisiana State University, Oklahoma State University and The University of Alabama are jointly hosting a Graduate Student Symposium in Analytics as a venue for communicating analytical ideas, sharing SAS software techniques, and learning from analytics professionals. The symposium will consist of nine student presenters (three from each university) and will cover a wide variety of analytics topics, including ways in which SAS software is used to analyze data. An industry expert from a sponsoring company will be assigned to each student presenter and will serve as a mentor, providing recommendations and positive commentary after the presentation on how the student’s work contributes to the body of knowledge on methodology and software application.
The Student Symposium will be held on Monday and Tuesday mornings.
See below for a listing of the scheduled abstracts. For more information, please contact Joni Shreve at jnunner@lsu.edu.
Click on a title to view abstract or click here to Expand All or Collapse All.
Louisiana State University
Predicting Loan Defaults in the Wild West of Peer-to-Peer Lending --- Daniel Wood
Kickstarter and GoFundMe are two of the more popular crowdfunding platforms where ideas find the money they need to move from concept to reality. Ideas are presented by their owners and interested supporters can financially support the causes for which they have a passion. Many projects are funded by small contributions from thousands of supporters. This crowdfunding concept has picked up steam in the personal banking arena. Businesses like Lending Club and Prosper bring together borrowers needing an unsecured personal loan and individual lenders seeking investments with higher returns. With those higher returns comes a risk of high default rates and losses for the investor. With over 168,000 new loans granted by Lending Club in Q2, in which ones should investors be investing? Predictive modeling methods are explored with SAS® Enterprise Miner and SAS® Enterprise Guide to help answer this question. Working with loan applications and loan performance data from Lending Club, the purpose of this analysis is to identify borrowers most likely to default as a means of steering investors to fund loans having lower risk and potentially higher returns.
Getting Involved: Using SAS® to Analyze Voter Participation in Louisiana Elections --- Fred Shumate
Louisiana democracy faces challenges caused by gaps in voter participation tied to age, ethnicity, and other factors. Decisions about elected representation, government policies, and other public matters are diminished when large segments of the voting populace are absent from the discussion. Historically, the Louisiana electorate has skewed older, whiter and more affluent. Thus, the political participation or lack thereof, of young voters and minorities has received increased attention and scrutiny. By using SAS® tools for data manipulation, analysis, and modeling, we are able to provide a basic analysis of Louisiana voting trends to pinpoint underrepresented populations, evaluate trends in voter participation, and recommend starting points for further study and outreach strategies. SAS® Enterprise Guide and Enterprise Miner are used to build decision trees to better understand what factors most impact a registered voter’s likelihood to participate and to build logistic regression models to forecast participation by subgroups in anticipation of this fall’s Presidential Election.
The Pace and Space Era of NBA Basketball --- Joshua Nix
In recent years, NBA General Managers have been taking a much more analytically driven approach in constructing rosters be searching for players that fit a certain style of play. These styles of play are generally defined by the pace at which a team plays and the selection of shots a team usually takes. By analyzing data over the past five to ten seasons, this presentation will describe the trends of all thirty NBA teams in terms of style of play, offensive and defensive efficiencies, shot selection, and other factors. This presentation will also describe correlations between these metrics and win/loss percentages over this time period to see what has worked over the years and how teams should go about building teams in the future based on these trends.
Oklahoma State University
Determining the Functionality of Water Pumps in Tanzania Using SAS® EM and VA --- Indra kiran Chowdavarapu and Vivek Manikandan Damodaran
Accessibility to clean and hygienic drinking water is a basic luxury every human being deserves. In Tanzania, there are 23 million people who do not have access to safe water and are forced to walk miles in order to fetch Water for daily needs. The prevailing problem is more of a result of poor maintenance and inefficient functioning of existing infrastructure such as hand pumps. To solve the current water crisis and ensure accessibility to safe water, there is a need to locate non-functional and functional pumps that need repair so that they can be repaired or replaced. It is highly cost ineffective and impractical to manually inspect the functionality of over 74,251 water points in a country like Tanzania. The objective of this study is to build a model to predict which pumps are functional, which needs some repair and which don’t work at all by using the data from the Tanzania Ministry of Water. After pre-processing, the final data consists of 39 variables and 74,251 observations. We used SAS Bridge for ESRI and SAS VA to illustrate spatial variation of functional water points at regional level of Tanzania along with other socio economic variables. Among decision tree, neural network, logistic regression and HP random forest models, random forest model was found the best model. The classification of water pumps using the champion model will expedite maintenance operations of water points that will ensure clean and accessible water across Tanzania in low cost and short period of time.
How to stop Stephen Curry? --- Wei Gao and Vrushali Walde
Basketball is one of the most popular games in America next to football and baseball. Stephen Curry is a professional basketball player for the Golden State Warriors of the National Basketball Association (NBA). With 30.2 points per game for the 2015-2016 season, he is the top NBA 3-point shooter and seems to be an unstoppable player in NBA. He led the Golden State Warriors to an astonishing 73 win and 9 lose record for 2015-2016. So the focus of this research is how to defend him. In the 2015-2016 season, among the 73 winning games, his 3-point percentage is 46.6%. However, among the nine losing games, the 3-point percentage dropped to 35.4%. What makes such a huge difference between winning and losing games performance? Some of the questions answered through this research are:
1. Can anybody effectively defend Curry?
2. Is help defense effective when guarding Curry at various ranges?
3. Will taller and stronger players better defend Curry? 4. Does Curry have any shooting pattern?
The NBA official website offers statistics, which included almost all the variables needed, such as players’ height, weight, speed, total rebounds, stats of different shooting area etc. In addition, Curry’s shooting statistics were recorded manually by watching games played during the study timeframe. Shooting statistics included how Curry shot (catch-and-shot, lay-up, etc.), distance from the rim, and shot outcome. All the games that Golden State lost and won against the same opponent were included in the data. Logistic regression and decision tree were used for data modeling.
Analyzing Sentiments in Tweets for Tesla Model 3 --- Tejaswi Jha and Praneeth Guggilla
Tesla Model 3 is making news in the history of automobiles as never seen before. The new electric car already has more than 400,000 reservations and counting. We carried out a descriptive analysis of sales of all Tesla models and found that the number of reservations till date are more than three times of sales of all previous Tesla cars combined. Clearly there is a lot of buzz surrounding this and such buzz influences consumers’ opinions and sentiments and which in turn lead to bookings. This paper aims to summarize findings about people’s opinions, reviews and sentiments about Tesla’s new car Model 3 using textual analysis of tweets collected from Feb 2016. For this, we will use the live streaming data from Twitter over time and study its pattern based on the booking timeline. We have been collecting data from February 2016 when the interest of people in this model spiked suddenly. Currently, we are analyzing about 10,000 tweets. We are using the SAS® Enterprise Miner and SAS® Sentiment Analysis Studio to evaluate key questions pertaining to the analysis such as following. What features do people think about? What are the factors that motivate people to reserve Tesla? What factors are discouraging them (e.g.waiting period)? Is the nature of sentiment in comments (positive or negative) changing over time?
The University of Alabama
The Effect of Chef Curry: Determining the Advanced Statistics Associated with the Three Point Shot Using SAS® --- Taylor Larkin
This past year, the Golden State Warriors achieved a record-breaking 73 regular season wins. This accomplishment could not have been done without their reigning Most Valuable Player (MVP) champion Stephen Curry and his historic shooting performance. Shattering his previous National Basketball Association (NBA) record of 286 three point shots made during the regular season last year, he accrued an astounding 402 this season. With increased emphasis on the advantages of the three point shot and guard-heavy offenses in the NBA today, organizations are naturally eager to investigate player statistics related to the ability to shoot at long ranges, especially for the best of shooters. Furthermore, the addition of more advanced data collecting entities such as SportsVU invites an incredible opportunity for data analysis, moving beyond simply using aggregated box scores. This work uses quantile regression within SAS® 9.4 to explore the relationships between the three point shot and other relevant advanced statistics, including some SportsVU player tracking data, for the top percentile of three point shooters from the 2015-2016 NBA regular season. [Presented as an ePoster at SAS Analytics Conference, 2016]
Know Thyself: Diabetes Trend Analysis --- Caroline Bell
Throughout history, the phrase “know thyself” has been the aspiration of many. The trend of wearable technologies has certainly provided the opportunity to collect personal data; thus, allowing individuals to “know thyself” on a more sophisticated level. Specifically, wearable technologies that can track a patient’s medical profile in a web-based environment, such as blood glucose monitors, are saving lives. The main goal for diabetics is to replicate the functions of the pancreas in a manner that allows them to live a normal, functioning lifestyle. Many diabetics have access to a visual analytics website to track their blood glucose readings; however, they often are unreadable and overloaded with information. Analyzing these readings from the glucose monitor and insulin pump with SAS®, diabetics can parse their own information into more simplified and readable graphs. This presentation demonstrates the ease in creating these visualizations. Not only is this beneficial for diabetics, but also the doctors that prescribe the necessary basal and bolus levels of insulin for one’s insulin pump.
Working Out with SAS®: How Fitness Meets Analytics --- Addison Arnold
How many times do you work out per week? It is no secret that working out is a healthy habit, but how do we make time for it? Students at the University of Alabama incorporate exercise into their schedules by visiting the University Recreation Center (UREC), a fully equipped fitness facility. In the past five years, students have swiped into the main UREC facility over 2 million times – meaning that on average, approximately 1,100 students work out in the UREC every day. With such a wealth of data, many questions arise: How do students fit working out into their schedules? Do freshmen exercise more than seniors in order to avoid the infamous “freshmen-fifteen?” Are students exercising heavily in February to prepare for Spring Break holiday? With the help of SAS®, this project investigates these questions and forecasts future participation rates by analyzing data from the UREC. By evaluating these results, perhaps insight will be gained for the management of an individual’s fitness schedule.