Introduction to Data Science

  • CMPS-3160/6160: Introduction to Data Science Fall 2023
  • Tulane University
  • 3 Credit Hours
  • Prerequisite Courses: CMPS 1100 Foundations of Programming or CMPS 1500 Intro to Computer Science I, or consent of instructor.

  • Instructor: Dr. Saad Hassan (saadhassan at tulane.edu)
  • Lectures Times: TR 2:00 - 3:15
  • Room: Stanley Thomas 302
  • Office: Paul Hall 307
  • Office Hours: Thursday 3:30 PM to 4:30 PMp and by and by appointment, See Link in Tulane Canvas
  • Webpage: https://nmattei.github.io/cmps3160/
  • GitHub: https://github.com/nmattei/cmps3160

The instructor and TAs have office hours and are available by appointment. Please reach out to us directly to setup extra time if you need more support during the semester. If you need help please check the discussion board on Canvas!. We check it regularly to answer common questions on projects and homeworks. The solution to your question might already be there!

  • Teaching Assistants

  • Graduate Assistant Yunbei Zhang (yzhang111 at tulane.edu), Office: ST 309
    • Office Hours Monday (3:00-5:00 PM), Wednesday (3:00-5:00 PM), Friday (4:00 to 6:00 PM), By appointment via email
  • Undergraduate Assistant Lorraine L Steigner (lsteigner@tulane.edu), Location: ST 316
    • Office Hours Monday (5:00-7:00 PM), Tuesday (5:00-6:00 PM), Wednesday (5:00-6:00 PM), By appointment via email

Course Communication Policy

There are a variety of methods you can use to get in touch with us, and we expect to be able to get in touch with you. A few general policies.

  • When emailing, please email all TAs and the professor of your section. We will respond within 24 hours. Turn around may be faster, but do not rely on it.
  • We expect the same from you: that you will check your email/Canvas every 24 hours. All major announcements will be distributed via the Announcements function of Canvas.
  • We are all available to have drop in office hours and are available by appointment. Please reach out to us directly to setup extra time if you need more support during the semester.
  • If you have a general question, please check or post on the Course Slack. We check it regularly to answer common questions on projects and homeworks. The solution to your question might already be there!

Catalog / Course Description

The aim of this course is to provide the student with an introduction to the main concepts and techniques required for collecting, processing, and deriving insight into data. Data Science is an interdisciplinary set of topics that includes everything you need to create data driven answers and solutions to specific business, scientific, or sociological questions. Topics typically covered include an introduction to one or more data collection and management systems, e.g., SQL, web scraping, and various data repositories; exploratory and statistical data analysis, e.g., bootstrapping, measures of central tendency, hypothesis testing and machine learning techniques including linear regression and clustering; data and information visualization, e.g., plotting and interactive charts using various technologies; and presentation and communication of the results of these analyses.

Prerequisite: CMPS 1100 Foundations of Programming or CMPS 1500 Intro to Computer Science I

Note: This is a programming and mathematics intensive class, you will be programming every day. While this course only assumes introductory programming knowledge, the assignments will require extensive programming in Python as well as mathematical maturity. Helpful things to know (but are not required) include the ability to navigate a Linux/Unix command prompt, a working understanding of graph theory, probability theory, and an understanding of basic algorithms is helpful.

Course Goals, Objectives, and Overview

Data Science is an interdisciplinary set of topics that includes everything you need to create data driven answers and solutions to specific business, scientific, or sociological questions. The goal of data science is to improve decision making based on insights from data. As a field, Data Science encompasses a set of principles, problem definitions, algorithms, and processes for extracting nonobvious and useful patterns from datasets.

This course will cover:

  1. Data management systems, e.g., SQL, web scraping, and various data repositories;
  2. Exploratory and statistical data analysis, e.g., bootstrapping, measures of central tendency, hypothesis testing and machine learning techniques including linear regression and clustering;
  3. Data and information visualization, e.g., plotting and interactive charts using various technologies; and
  4. Presentation and communication of results.

The course will use Python and be largely project and case study driven with students expected to analyze real datasets and post an analysis/tutorial publicly on GitHub at the end of the course. Note that this is a programming intensive course and familiarity with Python is expected.

Students should be comfortable with programming in at least one language (preferably Python) and have had a reasonable amount of math background including one college level math course. There are many helpful tutorials and background material at the Links & Resources Page, please make use of it.

We’ll be using Google Colaboratory and a number of packages including NumPy, SciPy, SciKit, and Pandas.

Course Learning Outcomes

At the conclusion of this course students will be able to:

  • Open, load, and manipulate data from various sources using industry standard tools.
  • Be able to use one or more data management and storage systems to load, explore, and clean data.
  • Be able to perform basic statistical analysis of the data including generating summary tables of various statistics and visualization.
  • Be able to express trends and implications present in data through the use of statistical hypothesis testing including p-values and confidence intervals.
  • Be able to use one or more machine learning algorithms for classification and regression.
  • Be able to present the results of a complete data analysis in written, visual, and presentation form.

Program-Level Outcomes

This course fulfills the requirement of one of the CMPS 3000-level or above courses required for the coordinate major in computer science. Students need to complete three such courses in order to complete the requirements for a coordinate major. For more information on the coordinate major please see the requirements at the Registrar’s Website

Required and Suggested Student Resources

There is no required textbook for this course, however there are two texts that are very strongly suggeted for purchase. We will make extensive use of online textbooks and articles for the required reading that you will be quizzed on. You will also need access to a computer complete the assignments. If you do not have access to a computer please see the instructor ASAP.

Books Highly Encouraged:

  • Data Science from Scratch: First Principles with Python, Second Edition, Joel Grus. O’Reilly Media, 2019. Code but not text available on GitHub.

  • Practical Statistics for Data Scientists: 50 Essential Concepts, Peter Bruce and Andrew Bruce. O’Reilly Media, 2017. Text of 2nd Printing available online. Code available on GitHub.

Other Good Online Books:


Evaluation Procedures and Grading Criteria

This course will consist of seven distinct grading areas. Note that all point values described below for individual assignments are subject to change. All points are points, relative percentages are given. More information about all the assignments can be found on the Assignments Page and the Final Tutorial Page. All assignments and due dates are posted in Tulane Canvas.

  • 20 (5%) In Class Activities, Participation, Attendance.
  • 120 (30%) Labs
  • 24 (6%) Pre-Lab Quiz Questions
  • 55 (14%) Test 1
  • 55 (14%) Test 2
  • 125 (31%) Project (Milestone 1: 20, Milestone 2: 20, Project Presentation: 20, Final Notebook: 65)

An important aspect of this course is becoming a better coder. Hence all coding assignments will consist of a Professionalism component. One handy resource for this is Arie’s Coding Guide. Note that this was written for Intro. to AI CMPS 3140 but many of the same issues apply to this course.

In Class Activities, Participation, and Attendance

Attendance will be monitored through an in-class activities. Throughout the semester we will regularly complete short in class exercises such as brainstorming activities, think-pair-share, pre/post questions, and short answer writing. This may also include presenting/explaining answers to labs in class.

Labs and Pre-Lab Questions

Every Friday in class will be interactive lab time. Before Friday you will be required to answer a few short questions in the form of a Canvas Quiz. In class you will have the opportunity to work on the Labs with the instructor and TAs. On these days it will be important to bring a laptop to class to participate in the work. Labs will be worth 10 points each and graded based the rubric in Canvas.

The best way to learn is by doing, so these will largely be applied assignments that provide hands-on experience with the basic skills a data scientist needs in industry. Posting solutions publicly online without the staff’s express consent is a direct violation of our academic integrity policy. Late assignments will not be accepted.

Test 1 and Test 2

This will be a written, close book, in class exam. You are allowed one cheat sheet, front and back, 8.5x11 inch paper. You will be required to turn in this cheat sheet with the actual exam and it will be graded for completeness and neatness. Test 2 is cumulative. This semester both exams will take place in person. You must earn at least a 60% average between the two exams to pass the course.

Final Project

In the interest of building students’ public portfolios, and in the spirit of “learning by doing”, students will create a self-contained online tutorial to be posted publicly and a 5-minute presentation in class. This tutorial can be created individually or in a small group (Max 2 people, except graduate students). This assignment will be a publicly-accessible website that provides an end-to-end walk-through of identifying and scraping a specific data source, performing some exploratory analysis, and providing some sort of managerial or operational insight from that data. We will have several milestones associated with the final project:

  1. Identifying a dataset and establishing a GitHub.io Site, Extraction, Transform, and Load (ETL).
  2. Exploratory Data Analysis (EDA) Your notebook from Part 1 but expanded to include graphs, visualizations, and stats that show you can manipulate your data and understand the dataset you are working with.
  3. A final, in class presentation.
  4. A final tutorial and website which must include in addition to ETL and EDA a statistical or predictive model and testing.

A complete version of the assignment as well as all the past assignments can be found at the Final Tutorial Page.

Additional Work for Graduate Students

Graduate students who enroll in this course will be required to complete their final project/tutorial as an individual.

  • All work will be turned in on Canvas. All work will either be distributed via Canvas or in the case of Labs and examples, via the CMPS 3160 Github.

  • All work will be due at class time on the day assigned. This means turning things in during or after class is considered late. This will be consistent throughout the semester.

Late Work Policy

All work must be turned in on time unless explicit consent for outstanding circumstances is given beforehand (or in the case of illness, with a documented absence after). Any late work without prior authorization will not be accepted and count as a 0.

Final Grade Policy

The weighted average will determine your letter grade roughly as follows, +/- grades will be given for borderline cases.

  • A >= 90%
  • B >= 80%
  • C >= 70%
  • D >= 60%
  • F < 60%

All grades will be posted on Canvas throughout the semester.


Schedule and Workload

See the Schedule Page for the schedule and assignments.

This is an upper division computer science course, it is hard, there will be a lot of work. You will sometimes have multiple assignments at a time and be responsible for managing the deadlines. Expect to spend 4-6 hours per week outside of class on this course (Tulane policy is 1-2 hours per hour in class).

If you need help please check the discussion board on Canvas!. We check it regularly to answer common questions on projects and homeworks. The solution to your question might already be there!

Students are reminded to make use of office hours. Please reach out to any of the course staff whenever you need and we can make appointments to meet if you require it.

Attendance

Students are required to attend all classes and labs unless they are ill or prevented from attending by exceptional circumstances and with a valid excuse note. Students are responsible for notifying instructors about absences that result from serious illnesses, injuries, or critical personal problems. Students with frequent absences will be reported and/or removed from the course according to university policy. If a student cannot attend class for any reason, the student is responsible for communicating with their instructors to make up any work they may miss.


Other Course and Tulane Policies

Use of Electronic Devices

Please silence your cellphones during class. If you want to use a laptop or other device with a large screen for note taking please sit in the back rows of the classroom – it’s distracting to other students https://www.scientificamerican.com/article/students-are-better-off-without-a-laptop-in-the-classroom/

Note: On Lab days (typically Friday’s) you will need to bring your laptop to class to work on the labs and engage with the work for the day.

Student Support Services

As we move to remote/hybrid teaching Tulane has moved a number of student success resources online. Please visit the Virtual Learning Student Support Pages for more information.

Please come talk to us if you feel you are behind or overwhelmed in this class. We can work with you and Tulane provides a suite of services to help you succeed in this course including the following. For more information please visit the student support services webpage.

  • Academic Advising - Advising maximizes student potential by sharing information, tools, and resources that empower them to make informed decisions about creating appropriate plans to achieve their academic goals.
  • Academic Learning & Tutoring - The ALTC supports students through supplemental instruction, peer tutoring, writing and presentation consultations, pop-up review sessions, study space, and online learning resources.
  • Case Management and Counseling - Students can leverage support services such as CAPS for Counseling Services, Case Management and Victim Support Services, and Goldman Center for Student Accessibility.
  • Success Coaching - Coaches help students create actionable steps to meet goals on topics such as college transition, time-management, motivation, testing anxiety, stress management, and decision-making.

ADA / Accessibility Statement

Tulane University strives to make all learning experiences as accessible as possible. If you anticipate or experience academic barriers based on your disability, please let me know immediately so that we can privately discuss options. I will never ask for medical documentation from you to support potential accommodation needs. Instead, to establish reasonable accommodations, I may request that you register with the Goldman Center for Student Accessibility. After registration, make arrangements with me as soon as possible to discuss your accommodations so that they may be implemented in a timely fashion. Goldman Center contact information: goldman@tulane.edu; (504) 862-8433; http://accessibility.tulane.edu.

Code of Academic Conduct and Academic Integrity

This course will follow Tulane’s Code of Academic Conduct. Cheating will be reported to the Associate Dean of Newcomb-Tulane College. Discussion is encouraged. However, what you turn in must be your own. You may not read another classmate’s solutions or copy a solution from the web. I will be running checks on the code turned in for plagiarism. If plagiarism is detected the minimum penalty is a 0 on the assignment and being reported, however, you may automatically fail this course at the discretion of the instructor or Honor Board.

For more information about the honor board process and the code of academic conduct please see the NTC Academic Integrity Website.

To be more clear (text from Hal Daumé III): Any assignment or exam that is handed in must be your own work (unless otherwise stated). However, talking with one another to understand the material better is strongly encouraged. Recognizing the distinction between cheating and cooperation is very important. If you copy someone else’s solution, you are cheating. If you let someone else copy your solution, you are cheating (this includes posting solutions online in a public place). If someone dictates a solution to you, you are cheating.

Everything you hand in must be in your own words, and based on your own understanding of the solution. If someone helps you understand the problem during a high-level discussion, you are not cheating. We strongly encourage students to help one another understand the material presented in class, in the book, and general issues relevant to the assignments. We also encourage the use of online resources to understand and clarify things, but not taking results verbatium. When taking an exam, you must work independently. Any collaboration during an exam will be considered cheating. Any student who is caught cheating will be given an F in the course and referred to the University Office of Student Conduct. Please don’t take that chance – if you’re having trouble understanding the material, please let me know and I will be more than happy to help.

The Code of Academic Conduct applies to all undergraduate students, full-time and part-time, at Tulane University. Tulane University expects and requires behavior compatible with its high standards of scholarship. By accepting admission to the university, a student accepts its regulations (i.e., Code of Academic Conduct and the Code of Student Conduct) and acknowledges the right of the university to take disciplinary action, including suspension or expulsion, for conduct judged unsatisfactory or disruptive.

Equity, Diversity, and Inclusion Statement (EDI)

Equity, diversity, and inclusion (EDI) are important Tulane values that are key drivers of academic excellence in our learning environments. In our drive for academic excellence, we seek to ensure that students, faculty, and staff across diverse social identities, cultural backgrounds, and lived experiences can thrive -especially those from from underrepresented and underserved communities (e.g., race/ethnicity, gender identity and expression, sexual orientation, disability, social class, international, veterans, religious minorities, age, and any other classification protected by applicable law -see Tulane’s Nondiscrimination Policy). In order to build a supportive culture and climate for every member of our community, we recognize that we each of have unique EDI strengths to share with others and that we also have areas for EDI growth, learning, and change. This EDI commitment and cultural humility helps us collectively build a university community and culture where everyone experiences belonging.

Religious Accommodation Policy

Per Tulane’s religious accommodation policy as stated at the bottom of Tulane’s academic calendar, we will make every reasonable effort to ensure that students are able to observe religious holidays without jeopardizing their ability to fulfill their academic obligations. Excused absences do not relieve the student from the responsibility for any course work required during the period of absence. Students should notify the instructor within the first two weeks of the semester about their intent to observe any holidays that fall on a class day or on the day of the final exam.

Title IX

Tulane University recognizes the inherent dignity of all individuals and promotes respect for all people. As such, Tulane is committed to providing an environment free of all forms of discrimination including sexual and gender-based discrimination, harassment, and violence like sexual assault, intimate partner violence, and stalking. If you (or someone you know) has experienced or is experiencing these types of behaviors, know that you are not alone. Resources and support are available: you can learn more at http://allin.tulane.edu. Any and all of your communications on these matters will be treated as either “Confidential” or “Private” as explained in the chart below. Please know that if you choose to confide in me I am mandated by the university to report to the Title IX Coordinator, as Tulane and I want to be sure you are connected with all the support the university can offer. You do not need to respond to outreach from the university if you do not want. You can also make a report yourself, including an anonymous report, through the form http://tulane.edu/concerns.

Confidential Private
Except in extreme circumstances, involving imminent danger to one’s self or others, nothing will be shared without your explicit permission. Conversations are kept as confidential as possible, but information is shared with key staff members so the University can offer resources and accommodations and take action if necessary for safety reasons.
Counseling and Psychological Services (CAPS): (504) 314-2277 or The Line (24/7): (504) 264-6074 Case Management and Victim Support Services: (504) 314-2160 or srss@tulane.edu
Student Health Center: (504) 865-5255 Tulane University Police (TUPD): Uptown - (504) 865-5911. Downtown – (504) 988-5531
Sexual Aggression Peer Hotline and Education (SAPHE): (504) 654-9543 Title IX Coordinator: (504) 314-2160 or msmith76@tulane.edu