Winter 2017 Workshops, tutorials, etc.

Events Co-hosted by the Data Science Initiative and Digital Scholarship

All events are in the Data Science Initiative seminar room, 360 Shields Library from 12 - 1.30pm on Friday, unless otherwise noted.

There are 4 related workshops on text processing and mining. These will assume familarity with git, which will be covererd in the first workshop. The other 3 tutorials/workshops are unrelated to each other or the text processing.

Please suggest topics you would like to have a workshop on. Send an email to datascience@ucdavis.edu.

  • 10.30 - 12 February 3rd Slash and Burn Command Line and Git
    Duncan Temple Lang

    Materials from Workshop

    A hands-on practicum on the UNIX/Linux command line and Git version control during which we'll learn the basics of getting around using a command line shell and using Git to manage files. No prior experience is necessary. We'll download all the needed software and get it installed on our machines together, have some directed play on the command line, connect to a remote Git repository, and push a repo and commits to our own, remote Git repositories. We’ll spend very little time talking about what’s under the hood or investigating the many powerful tools that Git offers for advanced users. The workshop is a boots-on-the-ground, get your hands dirty practicum designed to get you up and running on your own computer and give you skills necessary to start using command line Git. (Note: if you don’t know what the command line is, you should definitely take this workshop.) No previous programming or Unix command line experience is necessary. This is a hands-on workshop, so participants must bring a laptop on which they have administrative privileges (the ability to install software) to the workshop.

  • 12-1.30 February 3rd Text Mining Fundamentals
    Carl Stahmer

    This workshop will introduce basic concepts in text mining and Natural Language processing through discussion and hands-on coding of text processing functions that lay the groundwork for nearly all text mining processes. No prior programming experience is necessary to take the workshop; however, familiarity with the command line and Git is a prerequisite. (Participation in the February 3rd workshop on command line and Git will prepare you well for this workshop.) We'll code together as a group, leaving no text-miner behind. Topics covered will include: Word frequency analysis, basic chunking/tokenization, token distribution, and keyword in context (KWIC) analysis. Please come to the workshop with a working R development environment and Git already installed and operational on your system.

  • February 10 Introduction to Bayesian Modeling Using Stan
    Speaker: Matt Espe, Postdoctoral Fellow in Data Sciences & Plant Sciences

    This talk will explore Bayesian modeling using the probabilistic programming language Stan. We will start with a crash course in Bayesian modeling, show when and why MCMC methods are needed, and then will touch on the pros and cons of various different MCMC samplers with hands-on examples. The talk will end with a demonstration of using Stan through multiple interfaces (R, command line, etc).

  • February 17 Natural Language Processing: Text Normalization and Entity Extraction
    Carl Stahmer

    This hands-on workshop will introduce a collection of basic practices in Natural Language Processing. Topics covered will include: Named entity extraction, part of speech tagging, stemming, and lemmatization. Participants must have some programming experience in R and familiarity with the Unix command line and Git. (Participation in the January 20th workshop on Slash and Burn Command Line and Git and the February 3rd workshop on Text Mining Fundamentals will prepare you well for this workshop.) Please come to the workshop with a working R development environment and Git already installed and operational on your system.

  • February 24 Debugging in R
    Speaker: Duncan Temple Lang (R-core development team member, ...)

    We'll discuss strategies and tools for debugging R code, cover some examples and answer lots of questions.

  • March 3 Natural Language Processing: Form and Meaning
    Carl Stahmer

    This workshop will cover two NLP methods for assessing syntactic and semantic complexity and identifying key concepts represented in texts. Specific topics covered will include: hapax richness, author attribution, and Term Frequency-Inverse Document Frequency (TF-IDF) weighting. Participants must have some programming experience in R and familiarity with the Unix command line and Git. (Participation in the January 20th workshop on Slash and Burn Command Line and Git and the February 3rd workshop on Text Mining Fundamentals will prepare you well for this workshop.) Please come to the workshop with a working R development environment and Git already installed and operational on your system.

  • March 10 Natural Language Processing: Text Classification
    Carl Stahmer

    This hands-on workshop will cover the theory and practice of Topic Modeling as a method of untrained text classification. We will run a variety of models on the same corpus, identifying and discussing the function of model parameters and their effect on output. Validation practices will also be discussed. Participants should have moderate experience R and Git. Familiarity with the R “TM” package will be beneficial but is not required. (Participation in the previous four workshops in this series will prepare you well for this workshop.) Please come to the workshop with a working R development environment and Git already installed and operational on your system.