class: center, middle, inverse, title-slide # Introduction ## DATA 606 - Statistics & Probability for Data Analytics ### Jason Bryer, Ph.D. ### Spring 2021 --- class: hide-logo, bottom, right, title-slide background-image: url(images/Greetings_from_Statistics.jpeg) background-size: contain .font70[ [@skyetetra](https://twitter.com/ChelseaParlett/status/1340463322118856705) ] --- # Agenda * About your instructor * Syllabus * Class meetups * Course Schedule * Assignments (how you will be graded) * Participation * Homework * Labs * Data Project * Final exam * Software * The `DATA606` R Package * Using R Markdown --- # Introduction A little about me: * Assistant Professor at CUNY in Data Science and Information Systems * Principal Investigator for a Department of Education Grant (part of their FIPSE First in the World program) to develop a Diagnostic Assessment and Achievement of College Skills ([www.DAACS.net](http://www.daacs.net)) * Authored over a dozen R packages including: * [likert](http://github.com/jbryer/likert) * [sqlutils](http://github.com/jbryer/sqlutils) * [timeline](http://github.com/jbryer/timeline) * Specialize in propensity score methods. Three new methods/R packages developed include: * [multilevelPSA](http://github.com/jbryer/multilevelPSA) * [TriMatch](http://github.com/jbryer/TriMatch) * [PSAboot](http://github.com/jbryer/PSAboot) * Developer of a data dashboard for the NYS Office of Special Education and TAP for Data at Cornell University: https://data.osepartnership.org --- # Also a Father... <img src="images/BoysFall2019.jpg" width="65%" style="display: block; margin: auto;" /> --- # Runner... <table border='0' width='100%'><tr><td> <center><img src='images/2020Dopey.jpg' height='450'></center> </td><td> <center><img src='images/2019NYCMarathon.jpg' height='450'></center> </td></tr></table> --- # And photographer. <img src="images/Sleeping_Empire.jpg" width="80%" style="display: block; margin: auto;" /> --- # Syllabus <img src="images/hex/rmarkdown.png" class="title-hex"><img src="images/hex/blogdown.png" class="title-hex"> Syllabus and course materials are here: [https://spring2021.data606.net](https://spring2021.data606.net) The site is built using the [Blogdown](https://bookdown.org/yihui/blogdown/) R package and hosted on [Github](https://github.com/jbryer/DATA606Spring2021). Each page of the site has a "Improve this page" link at the bottom right, use that to start a pull request on Github. We will use Blackboard primary for submitting assignments only. Please submit: * A PDF or link to the built HTML (e.g. Rpubs, [Github](http://htmlpreview.github.io/)) PDFs are preferred for the homework as there is some LaTeX formatting in the R markdown files. The `tineytex` R package helps with install LaTeX, but you can also install LaTeX using [MiKTeX](http://miktex.org) (for Windows) and [BasicTeX](http://www.tug.org/mactex/morepackages.html) (for Mac) See this page for more information: https://spring2021.data606.net/course-overview/software/ --- # Meetups We will have meetups on Wednesday evenings at 8:30pm. Meetups will be recorded and made available the next day on the [course website](https://spring2021.data606.net/course-overview/meetups/). Though attending live is not strictly required, **I expect everyone to watch the lectures during the week.** I use the class meetups to convey important information and announcements. Very often I will cover some topics not in the textbook. Students who attend the meetups tend to do well on the assignments. **One Minute Papers** - Complete the one minute paper after each Meetup (whether you watch live or watch the recordings). It should take approximately one to two minutes to complete. This allows me to 1) verify you have attended/watch the meetup and 2) get feedback about what you learned and what you may still be unclear. Link: https://forms.gle/gY9SeBCPggHEtZYw6 .font60[ **Please note:** *Students who participate in this class with their camera on or use a profile image are agreeing to have their video or image recorded solely for the purpose of creating a record for students enrolled in the class to refer to, including those enrolled students who are unable to attend live. If you are unwilling to consent to have your profile or video image recorded, be sure to keep your camera off and do not use a profile image. Likewise, students who un-mute during class and participate orally are agreeing to have their voices recorded. If you are not willing to consent to have your voice recorded during class, you will need to keep your mute button activated and communicate exclusively using the "chat" feature, which allows students to type questions and comments live.* ] --- # Schedule <table> <thead> <tr> <th style="text-align:left;"> Start </th> <th style="text-align:left;"> End </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Friday, January 29, 2021 </td> <td style="text-align:left;"> Sunday, February 07, 2021 </td> <td style="text-align:left;"> Chapter 1 - Intro to Data, R, and Rstudio </td> </tr> <tr> <td style="text-align:left;"> Monday, February 08, 2021 </td> <td style="text-align:left;"> Sunday, February 14, 2021 </td> <td style="text-align:left;"> Chatper 2 - Summarizing Data </td> </tr> <tr> <td style="text-align:left;"> Monday, February 15, 2021 </td> <td style="text-align:left;"> Sunday, February 21, 2021 </td> <td style="text-align:left;"> Chapter 3 - Probability </td> </tr> <tr> <td style="text-align:left;"> Monday, February 22, 2021 </td> <td style="text-align:left;"> Sunday, March 07, 2021 </td> <td style="text-align:left;"> Chapter 4 -Distributions </td> </tr> <tr> <td style="text-align:left;"> Monday, March 08, 2021 </td> <td style="text-align:left;"> Sunday, March 14, 2021 </td> <td style="text-align:left;"> Chatper 5 - Foundation for Inference </td> </tr> <tr> <td style="text-align:left;"> Monday, March 15, 2021 </td> <td style="text-align:left;"> Sunday, March 21, 2021 </td> <td style="text-align:left;"> Chapter 6 - Inference for Categorical Data </td> </tr> <tr> <td style="text-align:left;"> Monday, March 22, 2021 </td> <td style="text-align:left;"> Friday, March 26, 2021 </td> <td style="text-align:left;"> Chapter 7 - Inference for Numerical Data </td> </tr> <tr> <td style="text-align:left;"> Monday, April 05, 2021 </td> <td style="text-align:left;"> Sunday, April 18, 2021 </td> <td style="text-align:left;"> Chapter 8 Linear Regression </td> </tr> <tr> <td style="text-align:left;"> Monday, April 19, 2021 </td> <td style="text-align:left;"> Sunday, May 02, 2021 </td> <td style="text-align:left;"> Chapter 9 - Multiple & Logistic Regression </td> </tr> <tr> <td style="text-align:left;"> Monday, May 03, 2021 </td> <td style="text-align:left;"> Monday, May 17, 2021 </td> <td style="text-align:left;"> Intro to Bayesian Analysis </td> </tr> <tr> <td style="text-align:left;"> Wednesday, May 19, 2021 </td> <td style="text-align:left;"> Sunday, May 23, 2021 </td> <td style="text-align:left;"> Final Exam </td> </tr> </tbody> </table> --- # Textbooks <img src="images/hex/openintro.png" class="title-hex"> .pull-left[ Diez, D.M., Barr, C.D., & Çetinkaya-Rundel, M. (2019). *OpenIntro Statistics (4th Ed)*. .font70[ This will be our primary textbook for most of the semesters. Our goal is to cover all the chapters. ] .center[ <a href = "https://github.com/jbryer/DATA606Spring2021/blob/master/Resources/Textbooks/os4.pdf"><img src = 'images/openintro.jpeg' alt = 'Open Intro Statistics' height = '375px' /></a> ] ] .pull-right[ Navarro, D. (2018, version 0.6). *Learning Statistics with R* .font70[ This textbooks has a chapter on Bayesian analysis that we will use at the end of the semester. ] .center[ <a href = "https://github.com/jbryer/DATA606Spring2021/blob/master/Resources/Textbooks/lsr-0.6.pdf"><img src = 'images/lsr.png' alt = 'Learning Statistics with R' height = '375px' /></a> ] ] --- # Assignments * [DAACS](https://spring2021.data606.net/assignments/daacs) (6%) * [Participation](https://spring2021.data606.net/assignments/participation) (10%) * One Minute Papers * Meetup Presentation - Present one practice problem during our weekly meetups. Signup using the [Google Spreadsheet](https://spring2021.data606.net/course-overview/meetups). **Please select odd number questions only!** * [Homework](https://spring2021.data606.net/assignments/homework) (18%) * [Labs](https://spring2021.data606.net/assignments/labs) (36%) * Labs are designed to introduce to you doing statistics with R. * Answer the questions in the main text as well as the "On Your Own" section. * [Data Project](https://spring2021.data606.net/assignments/project) (20%) * This allows you to analyze a dataset of your choosing. Projects will be shared with the class. This provides an opportunity for everyone to see different approaches to analyzing different datasets. * [Final exam](https://spring2021.data606.net/assignments/final/) (10%) --- # Communication * Slack Channel: https://data606spring2021.slack.com * [Click here to join the group](https://join.slack.com/t/data606spring2021/shared_invite/zt-gwto1eyo-r8tQGf_0V77AW4ey6rZKyA) * There is a general CUNY MSDS Slack channel [click here](https://join.slack.com/t/data606spring2021/shared_invite/zt-gwto1eyo-r8tQGf_0V77AW4ey6rZKyA) to join it. * Github Issues - Use this for issues or problems with the course or `DATA606` package: https://github.com/jbryer/DATA606spring2021/issues * Email: [jason.bryer@sps.cuny.edu](mailto:jason.bryer@sps.cuny.edu) * Phone/Zoom: Please email to schedule a time to meet. * Office hours will typically be: * Fridays from 12:00am to 1:00pm * I will use the same Zoom link that we use for the Wednesday night meetups. --- # Software <img src="images/hex/tinytex.png" class="title-hex"><img src="images/hex/RStudio.png" class="title-hex"><img src="images/hex/rmarkdown.png" class="title-hex"> This is an applied statistics course so we will make extensive use of the [R statistical programming language](https://www.r-project.org). You have two options for using R in this course: * CUNY SPS has an RStudio Server that you can access using a browser: https://rstudio.sps.cuny.edu You will use your CUNY login credentials to log in. * Install [R](https://cran.r-project.org) and [RStudio](https://rstudio.com) on your own computer. I encourage everyone to do this at some point by the end of the semester. I have instructions on the course website here: https://spring2021.data606.net/course-overview/software/ You will also need to have [LaTeX](https://www.latex-project.org) installed as well in order to create PDFs. The [`tinytex`](https://yihui.org/tinytex/) R package helps with this process: ``` install.packages('tinytex') tinytex::install_tinytex() ``` --- # DATA 606 Package <img src="images/hex/rmarkdown.png" class="title-hex"><img src="images/hex/devtools.png" class="title-hex"> The [`DATA606`](https://github.com/jbryer/DATA606) R package contains many data sets and functions we will use throughout the semester. It also has a `startLab` function that will copy each of the labs to your current working directory. Use the following commands to install the package (only necessary once per R installation): ``` remotes::install_github('jbryer/DATA606') ``` To start the first lab... ``` DATA606::startLab('Lab1') ``` This will copy the R markdown file and any supporting files to your current working directory. Use the "Knit" button in R Studio to build a PDF of the document. --- # Next steps... <img src="images/hex/DAACS.png" class="title-hex"> Before Monday: * Complete this Google form: https://forms.gle/FGsUWy61k8A3ujYH9 * Create an account at https://my.daacs.net and complete the self-regulated learning assessment * [Join the Slack channel](https://join.slack.com/t/bryer/shared_invite/zt-kdk4bypz-x5zJS0oTypT3TEPilfy~IA) Then: * Attend the meetup on Wednesday (February 3rd) at 8:30 pm (or watch the recording) * Start Lab 1 (due February 7th) * Start Homework 1 (due February 7th) --- class: inverse, right, middle, hide-logo <!--img src="images/hex/DATA606.png" width="150px"/--> # Good luck with the semester! [<svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> [ comment ] <path d="M440 6.5L24 246.4c-34.4 19.9-31.1 70.8 5.7 85.9L144 379.6V464c0 46.4 59.2 65.5 86.6 28.6l43.8-59.1 111.9 46.2c5.9 2.4 12.1 3.6 18.3 3.6 8.2 0 16.3-2.1 23.6-6.2 12.8-7.2 21.6-20 23.9-34.5l59.4-387.2c6.1-40.1-36.9-68.8-71.5-48.9zM192 464v-64.6l36.6 15.1L192 464zm212.6-28.7l-153.8-63.5L391 169.5c10.7-15.5-9.5-33.5-23.7-21.2L155.8 332.6 48 288 464 48l-59.4 387.3z"></path></svg> jason.bryer@cuny.edu](mailto:jason.bryer@cuny.edu) [<svg viewBox="0 0 448 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> [ comment ] <path d="M94.12 315.1c0 25.9-21.16 47.06-47.06 47.06S0 341 0 315.1c0-25.9 21.16-47.06 47.06-47.06h47.06v47.06zm23.72 0c0-25.9 21.16-47.06 47.06-47.06s47.06 21.16 47.06 47.06v117.84c0 25.9-21.16 47.06-47.06 47.06s-47.06-21.16-47.06-47.06V315.1zm47.06-188.98c-25.9 0-47.06-21.16-47.06-47.06S139 32 164.9 32s47.06 21.16 47.06 47.06v47.06H164.9zm0 23.72c25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06H47.06C21.16 243.96 0 222.8 0 196.9s21.16-47.06 47.06-47.06H164.9zm188.98 47.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06h-47.06V196.9zm-23.72 0c0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06V79.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06V196.9zM283.1 385.88c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06v-47.06h47.06zm0-23.72c-25.9 0-47.06-21.16-47.06-47.06 0-25.9 21.16-47.06 47.06-47.06h117.84c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06H283.1z"></path></svg> data606spring2021.slack.com](https://data606spring2021.slack.com) [<svg viewBox="0 0 496 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> [ comment ] <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> @jbryer](https://github.com/jbryer) [<svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> [ comment ] <path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"></path></svg> @jbryer](https://twitter.com/jbryer) [<svg viewBox="0 0 448 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> [ comment ] <path d="M416 32H31.9C14.3 32 0 46.5 0 64.3v383.4C0 465.5 14.3 480 31.9 480H416c17.6 0 32-14.5 32-32.3V64.3c0-17.8-14.4-32.3-32-32.3zM135.4 416H69V202.2h66.5V416zm-33.2-243c-21.3 0-38.5-17.3-38.5-38.5S80.9 96 102.2 96c21.2 0 38.5 17.3 38.5 38.5 0 21.3-17.2 38.5-38.5 38.5zm282.1 243h-66.4V312c0-24.8-.5-56.7-34.5-56.7-34.6 0-39.9 27-39.9 54.9V416h-66.4V202.2h63.7v29.2h.9c8.9-16.8 30.6-34.5 62.9-34.5 67.2 0 79.7 44.3 79.7 101.9V416z"></path></svg> @jasonbryer](https://www.linkedin.com/in/jasonbryer/) [<svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> [ comment ] <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> spring2021.data606.net](https://spring2021.data606.net) [<svg viewBox="0 0 512 512" xmlns="http://www.w3.org/2000/svg" style="height:1em;fill:currentColor;position:relative;display:inline-block;top:.1em;"> [ comment ] <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> bryer.org](https://bryer.org)