Social and Political Data Science: Introduction

Karl Ho

School of Economic, Political and Policy Sciences

University of Texas at Dallas

Data Science Roadmap: Scope and Methods of an Emerging Discipline

Presentation prepared for the National Central University, Taoyuan, Taiwan, ROC, June 14th, 2018

What is Data Science?

What is Data Science?

  1. Science of Data

  2. Understand Data Scientifically

  • Data is ever growing......

    • Moore's Law

    • Parkinson's Law

  • Science?

 

 

Moore's Law

Moore's Law

Computing power doubles every 18 months.

Parkinson's Law of Data

“Data expands to fill the space available for storage.”

The key word in "Data Science" is not Data....

- Jeff Leek

it is Science.

The long term impact of Data Science will be measured by the scientific questions we can answer with the data.

- Jeff Leek

Big data revolution is transforming almost all disciplines and challenging academia to rethink about research and teaching.

 

How do we ready researchers and students to understand the science of data and to scientifically understand data?

Lazer et al. 2009 Life in the network

This figure summarizes the link structure within a community of political blogs (from 2004), where red nodes indicate conservative blogs, and blue liberal. Orange links go from liberal to conservative, and purple ones from conservative to liberal. The size of each blog reflects the number of other blogs that link to it

Cumulated/Repeated Data

A Theory of Data: Understanding Data Generation Process

Data Generation

Administrative Data

Administrative data are defined as data which derive from the operation of administrative systems, typically by public sector agencies

 

- Connelly et al. 2016

A Taxonomy of Data

  1. Numbers

  2. Text

  3. Images

  4. Audio

  5. Video

  6. Signals

  7. Data of data: Metadata and Paradata

     

What is Data?

Data is everything.

  • Examples

Data Methods

  1. Survey

  2. Experiments

  3. Qualitative Data

  4. Text Data

  5. Web Data

  6. Complex Data

    1. Network Data

    2. Multiple-source linked Data

Statistical Modeling:
The Two Cultures 

Leo Breiman 2001: Statistical Science 

One assumes that the data are generated by a given stochastic data model.
The other uses algorithmic models and treats the data mechanism as unknown.
Data Model
Algorithmic Model
Small data
Complex, big data

Theory:
Data Generation Process

Data are generated in many fashions.   Picture this: independent variable x goes in one side of the box-- we call it nature for now-- and dependent variable y come out from the other side.

Theory:
Data Generation Process

Data Model

The analysis in this culture starts with assuming a stochastic data model for the inside of the black box. For example, a common data model is that data are generated by independent draws from response variables.

Response Variable= f(Predictor variables, random noise, parameters)

Reading the response variable is a function of a series of predictor/independent variables, plus random noise (normally distributed errors) and other parameters.  

Theory:
Data Generation Process

Data Model

The values of the parameters are estimated from the data and the model then used for information and/or prediction.

Theory:
Data Generation Process

 Algorithmic Modeling

The analysis in this approach considers the inside of the box complex and unknown. Their approach is to find a function f(x)-an algorithm that operates on x to predict the responses y.

The goal is to find algorithm that accurately predicts y.

Theory:
Data Generation Process

 Algorithmic Modeling

Unsupervised Learning

Supervised Learning         vs. 

Source: https://www.mathworks.com

Algorithm and Inference

Very broadly speaking, algorithms are what statisticians do while inference says why they do them.

 

- Efron and Hastie 2017

Social (Data) Scientist's mission

Two major areas to which social scientists can contribute, based on decades of experience and work with end users, are:

  1. Inference                                   

  2. Data quality. 

- Foster et al. 2016

Let the dataset change your mindset.

 

- Hans Rosling

Hans Rosling

Swedish physician and statistician

  • Founded Gapminder Foundation
  • Visualize historical data on public health and poverty

 

Data is the new oil.

Data is the new soil.

Hal Varian

Chief Economist, Google

Professor of Economics, University of California, Berkeley.

Big Data: New Tricks for Econometrics

Machine Learning and Econometrics

 

- Hal Varian

The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades.

- Mike Driscoll

“The Three Sexy Skills of Data Geeks”: “…with the Age of Data upon us, those who can model, munge, and visually communicate data...

Data Scientists

Nathan Yau (2009) calls them:

Data Science Roadmap

  1. Introduction - Data theory

  2. Data methods

  3. Statistics

  4. Programming

  5. Data Visualization

  6. Information Management

  7. Data Curation

  8. Spatial Models and Methods

  9. Machine Learning

  10. NLP/Text mining

Data Science Roadmap

  1. Introduction - Data theory

    1. Fundamentals

      1. Data concepts

      2. Data Generation Process (DGP)

    2. Algorithm-based vs. Data-based approaches

    3. Taxonomy

Data Science Roadmap

  1. Data methods

    1. ​Passive data

    2. Data at will

    3. Qualitative data

    4. Complex data

    5. Text data

Data Science Roadmap

  1. Statistics

    1. Sample and Population

    2. Inference

    3. Size and power

    4. Representation

Data Science Roadmap

  1. Programming

    1. R

    2. Python

    3. HTML

    4. Java script

Data Science Roadmap

  1. Data Visualization

    1. Tableau

    2. ggplot2

    3. Shiny

    4. D3.js

    5. Animation

Data Science Roadmap

  1. Information Management

    1. MapReduce

    2. Hadoop

    3. Cassandra

    4. MongoDB

    5. NoSQL

Data Science Roadmap

  1. Data curation

    1. Google OpenRefine

    2. Sampling

    3. Missing value concepts and management

Data Science Roadmap

  1. Spatial Models and Methods

    1. GIS

    2. R/Leaflet

    3. Python Map

    4. Remote Sensing

Data Science Roadmap

  1. Machine Learning

    1. Supervised

    2. Unsupervised

    3. Regression methods

    4. Neural Networks

Data Science Roadmap

  1. NLP/Text Mining

    1. Corpus

    2. Text Analysis

    3. Sentiment Analysis

    4. Natural Language Processing

Millions of open data-related jobs are begging not just business, computer science and mathematics programs but also Social Science seeking candidates who can deal with complicated big data and statistical models.  

 

Jobs, jobs and jobs

Sample job

Machine Learning

Machine can excel at frequent and high-volume task, at much faster rate and lower variance.
 

Human can tackle noble situations.

Quick Data: Taiwan Climate

Quick Data: Taiwan Climate

Spatial Data: United States

Spatial Data: United States

Java: D3 Library

Sentiment Analysis

Sentiment Analysis

"1","RT @RealJack: *Last year*

Democrats: “TRUMP IS SUCH A TERRIBLE PRESIDENT HE WILL GET US NUKED BY NORTH KOREA!!”

*Trump meets with Kim*

D…"
"2","Trump Kim summit: US wants 'major N Korea disarmament' by 2020 https://t.co/htY2r4eXXj"
"3","RT @thehill: JUST IN: Norwegian lawmakers nominate Trump for Nobel Peace Prize after summit with Kim Jong Un https://t.co/Uer56GgE2A https:…"
"4","RT @JRubinBlogger: Pompeo is acting exactly like Kerry -- indignant, caught up in process. Convinced concessions aren'[t concessions. Pathe…"
"5","RT @SykesCharlie: On Wednesday morning, Chosun Ilbo, South Korea’s paper of record, published a bleak editorial: “Kim Jong-un Got Everythin…"
"6","RT @chuckwoolery: Yesterday Shepard Smith, gave a scathing report on Trump/Kim Singapore summit. Following the Lefts lead."
"7","RT @WhiteHouse: Leaders the world over spoke of the powerful significance of President Trump’s summit with Kim Jong Un this week.

Read mor…"
"8","RT @PalmerReport: Fuck Donald Trump

Fuck Kim Jong Un

Fuck their fake summit

Fuck Vladimir Putin

Fuck Dennis Rodman

Fuck the media for…"
"9","Dennis Rodman has been the link between Kim Jong-Un and Donald Trump. Very Scary times we are in @StephMillerShow… https://t.co/UR0JWs0hcp"
"10","RT @TomSteyer: .@realDonaldTrump repeatedly said, Kim Jong Un ""loves his people."" This is what love looks like to Trump:

Over 100,000 poli…"
"11","RT @TheUSASingers: I’m gonna lay it on the line.

- Obama isn’t a Muslim
- Hillary doesn’t eat babies
- Socialists aren’t Nazis
- Nazis are…"

Thank you!

Wait... are you sure you have no questions?