Assignment #9: Visualization in R – Base Graphics, Lattice, and ggplot2
VaShay Carpenter
Objectives
- Compare three visualization systems in R: base graphics, lattice, and ggplot2.
- Apply each system to the same dataset and observe similarities and differences.
- Develop clear, reproducible code and articulate your insights.
Dataset
Choose one dataset from the Rdatasets collection. Load it in R with:
data("DatasetName", package = "PackageName")
head(DatasetName)Tasks
- Base R Graphics
Create at least two plots using base R functions. Examples:# Scatter plot plot(DatasetName$x, DatasetName$y, main = "Base: x vs. y", xlab = "x", ylab = "y") # Histogram hist(DatasetName$z, main = "Base: Distribution of z", xlab = "z") - Lattice Graphics
Use the lattice package to produce conditioned or multivariate plots. Examples:library(lattice) # Conditional scatter plot (small multiples) xyplot(y ~ x | factor(group), data = DatasetName, main = "Lattice: y vs. x by group") # Box-and-whisker plot bwplot(z ~ factor(category), data = DatasetName, main = "Lattice: z by category") - ggplot2
Use ggplot2’s grammar of graphics to create layered visuals. Examples:library(ggplot2) # Scatter plot with smoothing ggplot(DatasetName, aes(x = x, y = y, color = factor(group))) + geom_point() + geom_smooth(method = "lm") + labs(title = "ggplot2: y vs. x with trend by group") # Faceted histogram ggplot(DatasetName, aes(z)) + geom_histogram(binwidth = 1) + facet_wrap(~ category) + labs(title = "ggplot2: z distribution by category")
Discussion
On your blog, embed your three visualizations (one from each system) and address:
- How does the syntax and workflow differ between base, lattice, and ggplot2?
- Which system gave you the most control or produced the most “publication‑quality” output with minimal code?
- Any challenges or surprises you encountered when switching between systems.
How the syntax and workflow differ:
Base R: Uses a pen and paper model. You call a function, and it draws. Adding a legend requires a separate, manual command. It is fast for a quick glance but tedious for complex layering.
Lattice: Uses a formula interface (
y ~ x | z). It is designed for multi-panel trellis displays. It is more automated than Base R but the syntax is less intuitive than ggplot2.ggplot2: Uses the "Grammar of Graphics." You build the plot in layers (
+). It is the most powerful for mapping multiple variables (Color, Size, Shape) simultaneously.
Which system gave the most control?
ggplot2 provided the most 'publication-quality' output with minimal code. The ability to map a 4th variable (Weight/Size) and a 5th (Regression lines) within a single coherent block of code is unmatched by Base R or Lattice.
Challenges encountered:
One challenge encountered were warnings regarding the size aesthetic being dropped during statistical transformation. I resolved this by moving the size mapping specifically into the geom_point() layer and utilizing the updated linewidth parameter for geom_smooth(). This ensures the regression line remains focused on the relationship between Horsepower and MPG without attempting to ingest the 'Weight' variable meant only for the individual data observations.
The primary challenge was the 'mental shift' between the formulaic approach of Lattice and the layered approach of ggplot2. In Lattice, you define the panels upfront; in ggplot2, you can add facets or layers at any point in the process. Additionally, Base R lacks a native way to handle multivariate legends automatically, which highlights why modern data analysis has moved toward ggplot2.
Submission
- Push your R script (
assignment9.R) containing all code to GitHub. - Create a blog post that includes:
- Code snippets and generated plots.
- Discussion of your observations and preferences.
- Submit the URLs for your GitHub repository and blog post in Canvas under “Assignment #9.”
Github Link: https://github.com/cryo-cell/r-programming-assignments/blob/main/assignment9.Rmd
Disclaimer:
Generative AI is integrated into my professional workflow for drafting, structural organization, and code optimization. To avoid redundancy, this statement serves as a standing disclaimer for all entries. Generative AI has been utilized to ensure technical accuracy and to facilitate the very documentation requirements mandated by the curriculum available within the course syllabus.
Comments
Post a Comment