Posts

Final Project

Image
  Problem Description: I wanted to explore a dataset that was rich enough to support several different visual analysis types yet simple enough to interpret. And so I selected the diamonds dataset built into R, which has over 53,000 observations and ten variables describing carat weight / cut / color / clarity / price. My research question in general was: How do physical and quality attributes of a diamond affect its price - and what patterns emerge when they are considered individually and together? From that question I sliced the analysis into several connected ideas. I wanted to know how the distribution of diamond prices and carat size varies across cut categories. I also wondered which cut categories command higher prices and if there were significant deviations from the average price across the dataset. Last but not least, I wanted to test whether price was strongly associated with carat weight and how multiple attributes like cut and color affect average price. Such small que...

Assignment: Using ChatGPT to Generate and Share Visualizations

Image
  My visualization was produced very smoothly using ChatGPT. I asked it to produce a scatterplot of Sepal Length versus Sepal Width from the iris dataset, so the code it supplied ran almost instantly within RStudio. I liked how quickly I could alter plot colours, titles and themes - just a few minor adjustments - and it worked. My primary concern was to make certain the saved image exported at a sufficient resolution, but adding the ggsave () function solved that problem ChatGPT served as a quick assistant - it wrote the first clean draft so I could finish the design without losing control.

Assignment # 12 using Python with netwokxx and plotnine

Image
  I carried out the task in Python. I used NetworkX to build the graph besides Plotnine to draw it. After I installed the right Python version and the packages, the steps ran without trouble. The hardest part was the first install - Windows did not find pip or the correct Python runtime. Once I fixed that, the script ran and gave a clear network plot. Python needs more steps at the start than R - yet NetworkX or Plotnine gives ample room to adjust the picture. I would definitely use this method again for network visualizations, especially when working in a Python-based workflow.

Module #11 Assignment

Image
  I rebuilt the dot - dash rug scatterplot from Lukasz Piwek's Tufte in R guide because it shows how to keep ink that carries data and drop ink that does not. The chart removes grid lines, thick borders plus repeated words - only the marks that carry facts remain. Short ticks added to both axes reveal where values stack and where they sit alone. The result is a calm, spare graph that follows Tufte's rule - show the numbers nothing extra. I opened R Studio and loaded the built in dataset mtcars. I installed ggplot2 plus ggthemes. I typed the supplied commands - geom_point() drew the scatterplot, geom_rug() added the small tick marks on the sides and theme_tufte() gave the plain Tufte look. The graph that appeared plotted weight in thousands of pounds against miles per gallon - it matched the requested minimalist style.

Module #10 Assignment

Image
  The charts were drawn in R with the ggplot2 package and the built in economics file. They show the link between the share of people out of work (unemploy divided by pop) plus the median number of weeks people stay unemployed (uempmed) in the United States from 1967 through 2010. The first chart draws a line that links the two indicators for every month - the second chart gives each dot a color that matches its year. Both charts show that when the jobless share goes up, people also stay unemployed longer - the jump in duration is sharpest during recessions. A continuous line and a color scale that runs from light to dark turn decades of data into a single picture that exposes long run shifts and repeated downturns at a glance. A table of numbers cannot deliver the same immediate grasp of how the labor market contracts but also expands year after year. Reference: Yau, N. (2011). Visualize This: The FlowingData Guide to Design, Visualization, and Statistics. Wiley.

Module # 9 assignment

Image
  I used the default mtcars visualization because it has many continuous and categorical car performance variables. It also is a familiar dataset that lets you find relationships between horsepower, engine displacement, fuel efficiency and other performance numbers. It contains both numeric and factor variables, so it's good to test whether multivariate design techniques in ggplot2 reveal something more than two-variable plots. It plots engine displacement (disp) versus quarter-mile time (qsec), with horsepower (hp) coded in color, weight (wt) via point size and number of cylinders (cyl) via shape. The plot is also broken down by transmission type for automatic versus manual cars. This layout reveals a few things - larger engines and more horsepower get the quarter-mile done faster than heavier cars. It also shows that manual transmissions generally give better acceleration with the same displacement. A multivariate visualization worked well here because I could see how engine powe...

Module # 8 Correlation Analysis and ggplot2

Image
  The visualizations I created with ggplot2 shows the relationships of miles per gallon mpg to ten predictors from the mtcars-style dataset. The scatterplots revealed strong negative associations with wt, disp, cyl and hp variables, so heavier cars with larger engines and higher horsepower have generally poorer fuel economy. Positive relationships meanwhile appeared for drat, gear, vs, and am - manual transmission, higher rear-axle ratios and certain engine configurations correspond to better mpg. Regression lines for each facet confirmed these trends and described how steep or mild each relationship was. A grid with facets enabled better interpretation. Displaying each predictor as a small plot with the mpg axes consistent allowed me to compare the direction and the strength of each correlation very quickly. It turned analysis into a visual "dashboard" of relationships instead of forcing me between several charts. The consistent theme, light gridlines & limited colour pa...