Big Data Abstract
Big Data is a field that is currently exploding at universities and in companies around the world. However, very few high schools world wide, offer Big Data as a high school class. One thing holding schools back is that there is no formal curriculum for Big Data in a high school setting. The goal of this work is to create a high school curriculum for Big Data. The concept for offering Big Data as a high school class originated at Concordia International School in Shanghai by my coauthor Peter Tong who developed the class with support from IBM. I teach this course at Concordia International School in Hanoi. In collaboration with my colleague, while working on graduate level work in Data Science at Harvard University, we have created a formal high school curriculum for Big Data that includes Standards and Benchmarks. This is a key step in Big Data being offered as a high school class by a large number of high schools. The creation of a formal curriculum in a high school setting should facilitate the spread of Big Data as a course offering in high schools around the world.
Neil Whitehead
Course Description:
What is “Big Data”? In the simplest of terms, it refers to the tools, processes and procedures allowing an organization to create, manipulate, and manage very large data sets and storage facilities. As of the beginning of 2013, the world is creating more data each day, than for the past four decades combined. The process of shifting through such sheer quantities of data proves to be a demanding process for any person to do. Big Data is a new course that encompasses information technology, science and mathematics. This course will focus on the conceptual understanding and the application theory behind Big Data Analytics rather than explicit formulas and technical jargons. The main objective for this course is to create “awareness” and to be exposed to the realm of big data and the hidden dangers it might bring. This course will include some hands on experience utilizing big data analytics to solve some practical real life projects. Upon completion, you will be more aware about this big data phenomenon. By Dr. Peter Tong
Standards and Benchmarks Big Data – High School Curriculum
By Neil Whitehead and Dr. Peter Tong
Standard 1
Recognize how Big Data influences our world in ways that we do not necessarily notice.
Benchmark 1.1 Understand how having more data fundamentally changes what kinds of patterns we can see. We can see patterns that were previously invisible.
Benchmark 1.2 Understand how Big Data is behind the scenes in many aspects of our lives from credit ratings, insurance premiums etc.
Standard 2
Data Wrangling- Understand the challenges of collecting and cleaning data including joining data sets that do not match or contain conflicting information.
Benchmark 2.1 Understands how data sets can be Messy
Benchmark 2.2 Understands the challenges of joining data sets
Benchmark 2.3 Understands the challenges of data sets with conflicting data.
Benchmark 2.4 Structured and unstructured data. Understand the challenges of transforming data (such as changing the units of a one data set to connect it with another data set), such as sentiments data, converting picture/video data for analyses. (Students may use tools such as Data Monarch)
Benchmark 2.5 Can successfully take data and organize it in a format that it can be accepted by software such as Watson Analytics or Tableau
Standard 3
Understand how data storage has changed over time and how this has contributed to the rise of Big Data.
Benchmark 3.1 Understands how the cost per memory unit has dropped over time
Benchmark 3.2 Understands how computers are now networked so that a data set can be stored on many computers.
Standard 4
Understand the role of data production in the rise of Big Data
Benchmark 4.1 Understands the sources of data. This includes machines and devices producing data
Benchmark 4.1 Students are able to find good sources of data to work with in a project format
Benchmark 4.2 Students can understand case studies of how data such as GPS is collected and used
Benchmark 4.3 Students can understand how the increase in the volume of Data has increased due to devices creating data
Standard 5
Understand how companies are deriving value from Big Data
Benchmark 5.1: Understand how Big Data is used by companies such as Walmart for product placement
Benchmark 5.2 Understand companies such travel companies that leverage big data to earn a profit
Benchmark 5.3 Understand how Big Data is used by companies such as Amazon to make recommendation.
Standard 6
Develop an understanding of how changes in data processing have contributed to the rise of big data and machine learning such as AI (examples Hadoop, etc)
Benchmark 6.1 Understands how network such as Hadoop have contributed to the rise of Big data
Benchmark 6.2 Understand how processing speed has changed in recent years
Benchmark 6.3 Understands the necessary interrelationship between the growth in Storage, Processing, Bandwidth and Security
Standard 7
Develop an understanding of the likely future of Big Data including particularly in regards to the Internet of Things
Benchmark 7.1 Understands the history of the Internet of Things including RFID and how it gained traction
Benchmark 7.2 Understand how Moore’s Law has contributed to the progression from Mainframe computers to the Internet of Things.
Benchmark 7.3 Understands Sensing in the Big Data context (examples include health benefits of sensors such as Fitbit and safety implications of sensors in the seat of cars)
Benchmark 7.4 Understand how Big Data will give rise to smart appliances
Benchmark 7.5 Understands how Big Data will give rise to smart homes
Benchmark 7.6 Understands how Big Data will give rise to smart cities
Benchmark 7.7 Understands how AI will give rise to self driving cars
Standard 8
Develop an understanding of the risks associated with Big Data.
Benchmark 8.1 Understands Privacy privacy concerns
Benchmark 8.2 Understands predictive risks such as the possibility People being accused of crimes that they are predicted to be about to commit
Benchmark 8.3 Understands hacking risks such as the risk of hacking into someone’s device such as self-driving car and causing a crash or someone’s smart toaster to cause a fire.
Standard 9
Find correlations between data sets and do analysis. This includes, finding good sources of data, cleaning data, asking good questions of the data and making models. Students may use software products such as Watson Analytics and spreadsheets to help with this analysis.
Benchmark 9.1 : Is able to ask meaningful questions about the data to obtain insights (make and test hypotheses)
Benchmark 9.2: Is able to find patterns and relationships such as correlation in a data set.
Benchmark 9.3: Understands the limitations of Correlation (does mean causation can lead to misunderstandings – Simpson’s Paradox)
Benchmark 9.5: uses correlations to draw conclusions
Communication and Presentation
Standard 10
Create meaningful visual displays of data.
Benchmark 10.1 Understands how different visual variables work better for different types of data
Benchmark 10.2 Understands how visualizations can be misleading
Benchmark 10.3 Can successfully use technology to make meaningful visual displays of a large data set
Standard 11
Use good presentation skills to explain findings to an audience
Benchmark 11.1 Can successfully make a presentation to an audience using good eye contact and voice projection
Benchmark 11.2 Can successfully make a presentation to an audience with use of good visual display
Benchmark 11.3 Can successfully make a presentation to an audience with good explanation of with content related to Big Data
References
Cognitive Class, IBM, https://cognitiveclass.ai/. Accessed 23 Sept. 2017.
Common Core State Standards Mathematics, National Governors Association Center for Best Practices, 2010, www.corestandards.org/Math/Practice/. Accessed 23 Sept. 2017.
Cukier, Kenneth, and Viktor Mayor-Shonberger. Big Data. New York, Houghton Mifflin Harcourt, 2013.
Finlay, Steven. Predictive Analytics, Data Mining, Big Data, Myths, Misconceptions and Methods. New York, Palgrave Macmillan, 2014.
Kilback Phd, Brent. Personal interview.
Next Generation Science Standards: For States, By States (insert specific section title(s) being used if not referring to entirety of the NGSS), National Science Teachers Association, 2013, nextgenscience.org/. Accessed 23 Sept. 2017.
Siegel, Eric. Predictive Analytics. Hoboken New Jersey, John Wiley and Sons Inc, 2016.
Thomas, Rob. Big Data Revolution. John Wiley and Sons Ltd, 2015.
Tong Ph.D., Peter. Personal interview.