Modular Course Elements (MCEs)
MCEs address one of the biggest structural challenges to becoming a Data Professional: that few students can put their disciplinary research and degree requirements on hold to take additional courses. MCEs are designed to provide key skills that would be found within larger courses, but are presented as modules (⅓ to ½ a full course credit) available at a sub-semester scale. For example, the current semester long course Time Series Modeling and Spatial Statistics (CEE294) covers both time series and spatial analysis and modeling. Restructured to use MCEs, CEE294 would comprise a Time Series Analysis (MCE1) for half the credits in half the time, and Spatial Statistics (MCE2) would comprise the other half. A student may take MCE1, MCE2 or both based on their needs. MCEs can also be integrated into other existing courses. For example, MCE2 may be adapted in a Remote Sensing course where spatial statistics are needed. This modular delivery and integration into the university course catalog gives students flexibility to use MCEs to satisfy program/degree-required electives without lengthening their time for graduation (e.g., students may creatively select when to take these modules: one may choose to “front load” a semester with one or more MCEs to reduce competing deadlines in the final weeks of the semester).
To be an effective Data Professional one needs to be cognizant of and have exposure to four Integrative Skill Areas (ISAs) on which our MCE structure is based. Formal Knowledge includes foundational skills in probability, statistics, and modeling upon which data science methods and tools are based. Practiced Techniques include modern innovations and tools currently used in practice. Practical Wisdom includes communication skills and the ethics of using and protecting data privacy for effective decision making. Thoughtful Practice involves development and implementation of decision frameworks that leverage data-informed insight. These four skill areas are woven into this program and are essential components of MCEs. While not every student becomes an expert in each core competency of all four ISAs, a Data Professional should be cognizant of each and take at least one MCE from each skill area. These MCEs are also be available to a broader population of students beyond D3M@Tufts students, are adaptable for use by other institutions, and are sustainable beyond the life-span of this program through the DISC and participating departments.
- MCE-Hypothesis Testing – In-depth discussion of techniques and tools, using case examples to highlight both good and bad practices seen in research and practice. Biostatistics (BIO132; Crone)
- MCE-Bayesian Statistics – Example-driven, practical applications of Bayesian analysis. Statistical Pattern Recognition (COMP136; Hughes)
- MCE-Causality and Correlation – Addressing logical fallacies, addressing causality vs. correlation through mechanistic explanations. Biostatistics (BIO132; Crone), Time Series Modeling and Spatial Statistics (CEE294; Islam)
- MCE-Time Series Analysis – Fundamental and empirical tradeoffs among various approaches and implementations including stationary time series analysis, issues of non-stationarity, and linkages between time and frequency domain. Time Series Modeling and Spatial Statistics (CEE294; Islam), Machine Learning and Data Mining (COMP135; Allen)
- MCE- Spatial Analysis and Modeling – Introduction to concepts, vocabulary, techniques and common tools (GIS, R, Python). Time Series Modeling and Spatial Statistics (CEE294; Islam), Advanced GIS (UEP 294; Sumeeta)
- MCE-Data Collection, Cleaning, and Mining for Big and Small Data – An introduction to big data, data mining techniques, and how to ask good questions of data. Data Mining (COMP135; Allen)
- MCE-Machine Learning – Principles and techniques for applied machine learning. Machine Learning (COMP135 and 136; Allen, Liu, and Hughes)
- MCE-Database Design and Data Visualization – A practical, applied approach to database design, visualization of data, and tools for visual analytics Visualization (COMP150-VIZ; Chang)
- MCE-Communication with Broad Audiences – Effectively communicating scientific findings with policy makers and with the public open forum. The Art of Communications (DHP 215 Mankad)
- MCE-Practicing Science in Politicized World (Role Play Simulation) – A realistic role-play simulation and reflection for scientists and engineers to effectively interact with decision makers, especially in highly politicized situations. Science Diplomacy: Environmental Security in the Arctic (DHP 259; Berkman); Water Diplomacy (CEE194; Islam)
- MCE-Decision Making Under Uncertainty – Applications of decision theory focused on complexity science with different sources of uncertainty, nonlinearity and feedback. Environmental and Water Resources Systems (CEE215; Lamontagne)
- MCE-Data Analytics to Drive Decisions – Overview of analytical decision making, joint fact-finding processes, synthesis of multiple perspectives of competing stakeholders, tools for deliberation and generating understanding of creative options, examination of embedded values and bias. Water Diplomacy I (CEE194; Islam), Environmental and Water Resources Systems (CEE215; Lamontagne)
Concepts, methodologies, and tools covered in these four ISAs are interconnected and overlapping. We expect students to be able to differentiate between theory (formal knowledge; e.g., probability) and practice (e.g., machine learning). Our goal is to expose students to a range of data science topics to help them understand and appreciate motivations and methodologies from multiple perspectives, and to work in interdisciplinary teams by developing competencies within each ISA. We will restructure courses into 3-4 MCEs per semester for the first two years to create at least 12 MCEs. Each MCE leverages aspects of a shared program-wide database (see below), regardless of the core competency covered. In each MCE, students have the opportunity to explore the shared database from different perspectives using different tools. The benefits of this are twofold: first, students have the experience of focusing very deeply on a single data set, a critical skill to acquire and sharpen from graduate studies; second, students come to appreciate the diversity of conclusions that can arise from a single dataset by asking different questions and using different research methods and tools.
Integration of a Common Database into MCEs
A unique component of D3M@Tufts is the use of a common database across the MCEs, akin to the adoption of a common book in writing across the curricula programs. The use of a common dataset encourages cohort building, and individual students will be exposed to the same data from several disciplinary perspectives and use different tools to analyze it. In addition to increasing the efficiency of MCE delivery, this structure illustrates to students that many, sometimes contradictory narratives may emerge from a single dataset, highlighting the value of practical wisdom.
The initial shared dataset will be a 35-TB database of global change scenarios developed by Lamontagne, a member of the leadership team, using the Global Change Assessment Model. This database represents an easily accessible extension of the new Shared Socio-economic Pathways. This database includes many globally-contextualized scenarios of regional land use change, evolution in energy and transportation systems, demographics, food supply and demand, water use demand, and various greenhouse gas emissions tied to specific human activities.
In cases where the magnitude of the data is an integral aspect of the MCE content, students will be provided access to and training on Tufts’ 6,400 core high performance computing cluster. To simulate workflow and queuing experiences on a massively shared computing resource, Tufts Technology Services (TTS) will work with MCE instructors to limit class allocations (core counts and availability windows).
By interrogating these data, students explore questions relevant to their interests. For instance, what global socio-economic factors contribute most to potential deforestation in Brazil? What are the consequences of international climate mitigation efforts on groundwater depletion in the Middle East and how might this be tied to food security? What might national and regional energy and transportation portfolios look like under various mitigation scenarios? These questions will serve as foundational inquiries for various MCEs. The shared problem-space and data will also enable the D3M@Tufts cohort to form deep partnerships and learn from each other. In courses where the database size and format may detract from the learning objective of the MCE, D3M@Tufts faculty will work with MCE instructors to identify relevant subsets of the data, extract them from the main database, and place them in an appropriate repository for access and use.
An MCE in a course on water resources management (like CEE214) might explore the planning implications of ground water depletion in various global aquifers. First, students would apply statistical down-scaling techniques to derive data at the relevant scale for the aquifer in question. Students would then apply various statistical scenario discovery algorithms to identify global drivers tied to regional aquifer impacts.
An MCE in a machine learning and data mining course (COMP135) might explore unintended consequences of international mitigation regimes as a motivating example for the use unsupervised learning algorithms (for example see Lamontagne et al., 2018).
MCEs in both diplomacy courses (CEE194) and data visualization courses (COMP150) could explore effective communication of the tradeoffs inherent in mitigation strategies using the data set.