Visualization & Reporting
Submit Attendance: link
Files for today: functions | base plots | ggplot2 | quarto | solutions
Functions and Control Flow
- Use the
function()
function to create your own function - Functions have:
- Formals, which are the list of arguments that control how you call the function
- A body, which returns the last evaluated expression or whater you put in
return()
- An environment, which determines scoping (ie, how the function finds the values associated with named objects)
- There are some R-specific things to be aware of
- Lazy evaluation
- Things that appear to have the same name, eg
fun(x=x)
- Anonymous “lambda” functions
- Control Flow
if() {...} else {...}
, andifelse()
with
assignment- loops via
foreach()
andwhile()
Base Plots
- R has excellent plotting capabilities, but often requires lots of code to fully customize the plot
- Base plots are static
- No resizing or zooming
- Adding elements to a plot is done by layering on top, not modifying what exists
- You’ll often want to change:
- Add elements to your plot with:
lines()
,curve()
,text()
, orpoints()
- or customize the
axis()
- and add a
legend()
- To combine multiple plots:
- Use
par(mfrow=c(x,y))
- Control outer margins with
par(oma=c(...))
- Place a title via
title()
layout()
offer richer arrangements of subplots
- Use
ggplot2
- An introduction:
- The ggplot2 package is Hadley Wickmham’s implementation of a Grammar of Graphics described in Leland Wilkinson 2005 book
- It is immensely popular, with re-implementations in Python and Julia
- It is powerful and expressive method of creating graphics, and well documented with a website, book, and many other resources
- Every ggplot involves
- A dataset usually via
ggplot(data)
- Aesthetic mappings from variables in the dataset to plot elements, eg,
aes(x=height, y=weight)
- One or more geometric objects (eg, points in a scatterplot via
geom_point()
)
- A dataset usually via
- You’ll often want to annotate basic elements with
labs()
such asx
,y
,title
, andsubtitle
- More advanced aspects of the plotting system involve
- A statistical transformation of the data
- A scale to control the mapping from data to aesthetic attributes
- A coordinate system
- Facets, for creating small multiples of plots
- The theme system provides fine control over the non-data elements of the plot
Quarto
- Quarto is an open-source scientified and technical publishing system created by the team at Posit
- A Quarto document
- Can by a Jupyter Notebook or a plain text .qmd file with a YAML header
- Consisting of Pandoc markdown and code chunks
- Renders to multiple output formats including PDF, MS Word, HTML
- Enables reproducible research and literate programming
- Will likely be the mode via which you compose your homework assignments
- Pandoc markdown syntax:
- Is quite simple and the basics can be learned in an hour
- Includes Latex math equations, citations, cross references, and more
- Code chunks:
- Begin and end with three back ticks
`
- Support a variety of languages, including R, Python, Julia, C++, BASH
- Begin and end with three back ticks