Wharton School Professor Linda Zhao Gives a Lecture on "Data Science in Action"at PHBS
2017-07-12 15:49:47
At present, data science has become one of the most rapidly growing interdisciplinary fields. This discipline combines computer science, statistics, and domain knowledge into a specific application area that enables us to extract actionable information from data. Hal Varian, the chief economist at Google, is known to have said, “The sexy job in the next 10 years will be statisticians.”
 
If “sexy” means having rare qualities that are much in demand, data scientists are already there. There simply aren’t a lot of people with their combination of scientific background and computational and analytical skills. So what is the actual use of data science? What is the prospect of it?

 The academic lecture "Data science in action"

Linda Zhao, professor of statistics at the Wharton School of Business at the University of Pennsylvania, was invited to give a lecture on data science at Peking University HSBC business School (PHBS), sharing her insights and practical experience in data science with PHBS students and faculty. With research area covering from Beysian analysis, nonparametric analysis and numerical computation, she mainly publishes in international leading journals. Professor Zhao’s current on-going projects include forecasting house prices, inference for high dimensional data, data with measurement errors and post model selection inferences.

 Professor Zhao takes audience to the frontier of data science through several case studies 

During her lecture titled “Data science in action”, Professor Zhao took audience to the frontier of this area through several case studies conducted by her Wharton research team. The first case was about a Call Center based in Boston, which aims to improve the center’s efficiency and provide customers with more optimized services. Taking the perspective of queueing theory, Professor Zhao and her coauthors decomposed the service process into three fundamental components: arrivals, customer abandonment behavior and service durations. Each component involves different basic mathematical structures and requires a different style of statistical analysis.

Professor Zhao analyzes the call center case
 
By analyzing the collected data, it was found that the average call duration was 185 seconds, but the 7% call lasted only 10 seconds, which indicated that some operators quickly hung up those calls, leaving no possibility of good communication with customers. Since the call center assessed operators’ performance by the number of calls they took, operators were prone to suffer moral hazard by hanging up as soon as possible. Thus, by digging the truth behind the data, they identified the unqualified call operators and made improvements to the call center's management strategy.
 
The second case was the prediction and evaluation of the broadcast audience. In this case, Professor Zhao first introduced a satellite broadcast called Sirius XM, and then introduced one commercial and economic program launched by the Wharton School on this broadcast platform in January 2014. Zhao and her research team then had some interesting findings through analyzing several aspects of Sirius XM 1362 audience who listened to Wharton program, including their age, gender, income and educational level.

 Professor Zhao shares her findings on Lending Club
 
Then, Professor Zhao elaborated on a case study on a lending platform named Lending Club, a US peer-to-peer lending company, headquartered in San Francisco, California. It was the first p2p lender to register its offerings as securities with the Securities and Exchange Commission (SEC), and to offer loan trading on a secondary market. Lending Club makes money by charging borrowers an origination fee and investors a service fee. Though viewed as a pioneer in the fintech industry and one of the largest such firms, Lending Club experienced problems in early 2016, with difficulties in attracting investors, a scandal over some of the firm's loans.
 
Professor Zhao and her research team collected all the data from Lending Club from 2008 to 2014, and only 9% of the borrowers were successfully accepted by the platform. The team cleaned the data by detecting the missing values, processing the data format and the abnormal variables, and then analyzed the risk factors of lending club’s borrowers and lenders through effective models. The study found that Lending Club's main advantage lied in that the platform could effectively provide customers with risk-diversified loan options and to ensure a better rate of returns on the loans. Based on the study, her team improved the performance of lenders’ portfolios and optimized returns on investment.

 PHBS Professor David Ong raises a question

“Economics often involves setting up theoretical frameworks before it seeks to find empirical evidence, while data science tends to let data speak itself,” commented Professor Zhao, when asked about the difference between economic data analysis and data science. PHBS faculty also exchanged their views with Professor Zhao on P2P financing methods, data collection, and risk control. Further discussions even covered the comparison between Chinese P2P platforms and overseas P2P platforms in terms of the regulatory rules and risk management.
 
In the era of large data, data will continue to play an important role, as our daily life is closely related to data application. In the end, Professor Zhao also briefly shared the studies of lung cancer imaging, Google flu data and online CPI data. Perhaps, in the next ten years, the data will further affect the process of human social development, penetrating into all aspects of our lives.

By Annie Jin 


LATEST NEWS