Introduction

A series of brief explorations of the Fugazi Live Series data.

What was Fugazi’s biggest show?

attendancedata <- othervariables %>%
  filter(is.na(attendance)==FALSE) %>%
  mutate(attendance = as.integer(attendance)) %>%
  mutate(date = as.Date(date, "%d-%m-%Y")) %>%
  mutate(year = lubridate::year(date)) %>%
  select(year, date, venue, attendance)

maxattendance <- max(attendancedata$attendance)

maxattendance
#> [1] 15000

attendancedata %>%
  filter(attendance == maxattendance)
#> # A tibble: 1 × 4
#>    year date       venue                attendance
#>   <dbl> <date>     <chr>                     <int>
#> 1  2000 2000-06-04 Mission Dolores Park      15000

The biggest show was the Food Not Bombs 20th Anniversary on the 4th of June 2000 at Mission Dolores Park in San Francisco, with an estimated attendance of 15000 people. It seems that the show was recorded but it is not available yet as part of the Fugazi Live Series. There is a video of the show on youtube.

What was Fugazi’s longest tour?


meanattendance <- othervariables %>%
  filter(is.na(tour)==FALSE) %>%
  mutate(attendance = ifelse(is.na(attendance)==TRUE, 100, attendance)) %>%
  group_by(year) %>%
  summarise(meanattendance = mean(attendance)) %>%
  ungroup()

toursdata <- othervariables %>%
  filter(is.na(tour)==FALSE) %>%
  left_join(meanattendance) %>%
  mutate(attendance = ifelse(is.na(attendance)==TRUE,meanattendance,attendance)) %>%
  group_by(tour) %>%
  filter(is.na(date)==FALSE) %>%
  summarise(start = min(date), end = max(date), shows = n(), duration = as.numeric((end - start)), attendance=sum(attendance)) %>%
  ungroup() %>%
  arrange(desc(shows))
#> Joining with `by = join_by(year)`

  toursdata <- toursdata %>%
    mutate(meanattendance = as.integer(attendance / shows)) %>%
    arrange(start)

  toursdata <- toursdata %>%
    mutate(start = as.Date(start, "%d-%m-%Y")) %>%
    mutate(end = as.Date(end, "%d-%m-%Y"))

  toursdata$startyear <- lubridate::year(toursdata$start)
  toursdata$endyear <- lubridate::year(toursdata$end)

  toursdata$duration <- as.integer(toursdata$duration)
  toursdata$attendance <- as.integer(toursdata$attendance)
  
  toursdata <- toursdata %>%
    arrange(desc(shows))

head(toursdata, n=10)
#> # A tibble: 10 × 9
#>    tour           start      end        shows duration attendance meanattendance
#>    <chr>          <date>     <date>     <int>    <int>      <int>          <int>
#>  1 1990 Fall Eur… 1990-09-01 1990-11-07    60       67      43476            724
#>  2 1995 Spring/S… 1995-05-04 1995-07-14    59       71      72134           1222
#>  3 1992 Spring E… 1992-05-01 1992-07-11    56       71      55412            989
#>  4 1995 Fall USA… 1995-09-16 1995-11-20    50       65      68903           1378
#>  5 1993 Spring U… 1993-04-02 1993-05-31    48       59      74550           1553
#>  6 1990 Spring/S… 1990-05-02 1990-06-30    43       59      24080            560
#>  7 1988 Fall Eur… 1988-10-14 1988-12-16    39       63       7376            189
#>  8 1993 Fall USA… 1993-08-16 1993-09-29    39       44      58075           1489
#>  9 1991 Spring U… 1991-05-01 1991-06-14    38       44      27273            717
#> 10 1989 Spring U… 1989-04-05 1989-06-16    35       72      11162            318
#> # ℹ 2 more variables: startyear <dbl>, endyear <dbl>

On the 1990 Fall European Tour, between 1990-09-01 and 1990-11-07, Fugazi played 60 shows over 67 days, with a total attendance of 43,478 people. This tour wasn’t the longest in terms of the number of days, or the biggest in terms of total attendance (the 1993 Spring USA tour had a total attendance of 74,550 people), but it was the longest tour in terms of the number of shows.

Leads and lags

Most Fugazi songs were performed live for some time before being released on an album or EP. These lead times were often very considerable, measured in months or years. Were there any exceptions, songs whose live launch dates lagged behind the corresponding release dates? To find out, let’s start by getting the data on the releases and the corresponding release dates.

releasedates <- releasesdatalookup %>% 
  select(releaseid, releasedate) %>%
  mutate(releasedate = as.Date(releasedate, "%d/%m/%Y"))

mydf <- songvarslookup %>% 
  left_join(releasedates) %>%
  left_join(songidlookup)
#> Joining with `by = join_by(releaseid)`
#> Joining with `by = join_by(song, songid)`
mydf <- mydf %>% 
  select(songid, song, releaseid, releasedate) %>%
  arrange(songid)
head(mydf)
#>   songid         song releaseid releasedate
#> 1      1 23 beats off         6  1993-06-18
#> 2      2 and the same         2  1989-06-15
#> 3      3     argument         9  2001-10-16
#> 4      4  arpeggiator         8  1998-04-24
#> 5      5 back to base         7  1995-05-12
#> 6      6    bad mouth         1  1988-11-19

Now let’s calculate leads and lags by getting summary data on the songs and comparing the song launch dates to the corresponding release dates.

mysummary <- Repeatr::summary %>% 
  left_join(mydf) %>%
  mutate(lead = releasedate - launchdate) %>%
  select(song, launchdate, releasedate, lead) %>%
  arrange(lead)
#> Joining with `by = join_by(songid, song, releaseid, releasedate)`

head(mysummary, n = 10)
#> # A tibble: 10 × 4
#>    song                   launchdate releasedate lead    
#>    <chr>                  <date>     <date>      <drtn>  
#>  1 styrofoam              1990-05-17 1990-03-01  -77 days
#>  2 foreman's dog          1998-05-01 1998-04-24   -7 days
#>  3 blueprint              1989-11-25 1990-03-01   96 days
#>  4 steady diet            1991-04-12 1991-08-01  111 days
#>  5 life and limb          2001-06-21 2001-10-16  117 days
#>  6 public witness program 1993-02-05 1993-06-18  133 days
#>  7 polish                 1991-03-06 1991-08-01  148 days
#>  8 bulldog front          1988-06-15 1988-11-19  157 days
#>  9 nice new outfit        1991-02-20 1991-08-01  162 days
#> 10 combination lock       1994-11-27 1995-05-12  166 days

Surprisingly, there seem to be only 2 songs whose live debuts lagged behind the corresponding release dates: Styrofoam which was first played live 58 days after the launch of Repeater, and Foreman’s Dog which was first played live 4 days after the launch of End Hits. What was the average lead time for all Fugazi songs with a corresponding release?

mean(mysummary$lead)
#> Time difference of NA days

That is over 2 years, but perhaps the mean is biased upwards by a few extreme values…

mysummary <- mysummary %>%
  select(song, launchdate, releasedate, lead) %>%
  arrange(desc(lead))

head(mysummary, n = 10)
#> # A tibble: 10 × 4
#>    song                 launchdate releasedate lead     
#>    <chr>                <date>     <date>      <drtn>   
#>  1 the word             1987-09-03 2014-11-18  9938 days
#>  2 turn off your guns   1987-09-03 2014-11-18  9938 days
#>  3 in defense of humans 1987-09-03 2014-11-18  9938 days
#>  4 furniture            1987-09-03 2001-10-16  5157 days
#>  5 kyeo                 1987-10-07 1991-08-01  1394 days
#>  6 number 5             1998-11-21 2001-10-16  1060 days
#>  7 oh                   1998-11-29 2001-10-16  1052 days
#>  8 merchandise          1987-09-03 1990-03-01   910 days
#>  9 long division        1989-04-09 1991-08-01   844 days
#> 10 song #1              1987-09-03 1989-12-01   820 days

The median lead time is probably a more reliable indicator for how long Fugazi would tend to play a song live before it featuring on a discographical release.

median(mysummary$lead)
#> Time difference of NA days

We have discovered several interesting things:

  1. Styrofoam was the only Fugazi song whose live debut significantly lagged behind the corresponding release, although the live debut of Foreman’s Dog was 4 days after the release of End Hits.

  2. The median lead time for the live performance of a Fugazi song ahead of its corresponding discographical release date was approximately 1 year: 360 days.

At which venues did Fugazi play the most?

Listening to the Fugazi Live series in chronological order, the band returns to some venues again and again over the years. Let’s have a look at the venues with the largest numbers of Fugazi shows.

venuesdata <- othervariables %>%
  mutate(year = year(date)) %>%
  mutate(city = ifelse(flsid=="FLS1053", "Bremen", city)) %>%
  mutate(country = ifelse(flsid=="FLS1053", "Germany", country)) %>%
  filter(is.na(venue)==FALSE & is.na(city)==FALSE & is.na(country)==FALSE) %>%
  group_by(venue, city, country) %>%
  summarize(shows = n(), from=min(year), to = max(year)) %>%
  select(venue, city, country, shows, from, to) %>%
  arrange(desc(shows)) %>%
  ungroup()
#> `summarise()` has grouped output by 'venue', 'city'. You can override using the
#> `.groups` argument.

head(venuesdata, n = 10)
#> # A tibble: 10 × 6
#>    venue                 city        country shows  from    to
#>    <chr>                 <chr>       <chr>   <int> <dbl> <dbl>
#>  1 Fort Reno             Washington  USA        12  1988  2002
#>  2 Liberty Lunch         Austin      USA         9  1988  1998
#>  3 40 Watt               Athens      USA         8  1988  1999
#>  4 9:30 Club (1980-1995) Washington  USA         8  1988  1994
#>  5 First Avenue          Minneapolis USA         8  1991  2001
#>  6 Maxwell's             Hoboken     USA         8  1988  1998
#>  7 Wilson Center         Washington  USA         8  1987  1997
#>  8 Masquerade            Atlanta     USA         7  1990  1999
#>  9 Cat's Cradle          Chapel Hill USA         6  1987  1993
#> 10 Hollywood Palladium   Los Angeles USA         6  1991  1993

The top 10 venues are all in the USA, with the top two both in Washington DC - Fort Reno and the 9:30 club are the only 2 venues with more than 10 shows. In the case of Fort Reno, Fugazi played shows there 12 times between 1988 and 2002, only missing 3 years (1990, 1992 and 1995).

Let’s have a look at the top 10 overseas venues.

overseas_venuesdata <- venuesdata %>%
  filter(country!="USA" & shows>=4) %>%
  arrange(desc(shows))

head(overseas_venuesdata, n = 20)
#> # A tibble: 10 × 6
#>    venue            city                country     shows  from    to
#>    <chr>            <chr>               <chr>       <int> <dbl> <dbl>
#>  1 Forte Prenestino Rome                Italy           5  1988  1999
#>  2 Paradiso         Amsterdam           Netherlands     5  1990  1999
#>  3 92 Graus         Curitiba            Brazil          4  1994  1997
#>  4 Effenaar         Eindhoven           Netherlands     4  1988  1995
#>  5 Fabrik           Hamburg             Germany         4  1990  1999
#>  6 Powerstation     Auckland            New Zealand     4  1991  1997
#>  7 Riverside        Newcastle-Upon-Tyne England         4  1989  1999
#>  8 Rote Fabrik      Zurich              Switzerland     4  1990  1999
#>  9 Schlachthof      Bremen              Germany         4  1990  1999
#> 10 Vera             Groningen           Netherlands     4  1989  1995

Overseas, the venues with most Fugazi shows were Forte Prenestino in Italy and Paradiso in the Netherlands both with 5 shows. There were 9 other overseas venues with 4 shows. Proud to see that the number 1 Fugazi venue in the UK was the Newcastle Riverside, in my home town, which was where I saw them play in 1990!

Let’s have a quick look at the frequency distribution of the number of shows at each venue.

number_venues <- nrow(venuesdata)

cat(paste0("\n \n There are ", number_venues, " venues in the Fugazi Live Series data. \n \n"))
#> 
#>  
#>  There are 754 venues in the Fugazi Live Series data. 
#> 

overview_venuesdata <- venuesdata %>%
  group_by(shows) %>%
  summarize(venues = n()) %>%
  mutate(percentage = round(100*venues/number_venues, digits = 3)) %>%
  arrange(desc(shows)) %>%
  ungroup()

head(overview_venuesdata, n = 11)
#> # A tibble: 10 × 3
#>    shows venues percentage
#>    <int>  <int>      <dbl>
#>  1    12      1      0.133
#>  2     9      1      0.133
#>  3     8      5      0.663
#>  4     7      1      0.133
#>  5     6      3      0.398
#>  6     5      5      0.663
#>  7     4     16      2.12 
#>  8     3     27      3.58 
#>  9     2     97     12.9  
#> 10     1    598     79.3

Fugazi played at 733 venues but played at 79.4% of them only once, twice at 12.7% of venues, 3 shows at 3.4% of venues, and 4 shows at 2.3% of venues. Only 2.2% of venues had 5 or more shows.

In which city did Fugazi play at the most venues?

mydf <- othervariables

venues <- mydf %>% 
  group_by(city, venue) %>% 
  summarize(shows = n()) %>% 
  ungroup()
#> `summarise()` has grouped output by 'city'. You can override using the
#> `.groups` argument.

venues_per_city <- venues %>% 
  group_by(city) %>% 
  summarize(venues = n()) %>% 
  arrange(desc(venues)) %>% 
  ungroup()

venues_per_city
#> # A tibble: 406 × 2
#>    city          venues
#>    <chr>          <int>
#>  1 Washington        22
#>  2 New York          10
#>  3 Sydney             8
#>  4 Chicago            7
#>  5 Houston            7
#>  6 Berlin             6
#>  7 London             6
#>  8 Richmond           6
#>  9 San Francisco      6
#> 10 Birmingham         5
#> # ℹ 396 more rows

The city where Fugazi played at the most venues was Washington DC, followed by New York, and Portland.

Did Fugazi pick songs to perform randomly?

If songs were picked at random from all available songs, the performance count for each song should build up in a similar way over time. This can be checked by plotting the cumulative performance counts for a selection of songs. First, let’s get the data into a suitable format, so that it will be easy to count the number of times each song was played.


mydf <- Repeatr1 %>% select(date, song)

mydf <- mydf %>% 
  group_by(date, song) %>% 
  summarize(count=n()) %>% ungroup()
#> `summarise()` has grouped output by 'date'. You can override using the
#> `.groups` argument.

mydf_wide <- mydf %>% 
  pivot_wider(names_from = song, values_from = count, values_fill = 0)

head(mydf_wide)
#> # A tibble: 6 × 134
#>   date       furniture `in defense of humans` intro `joe #1` merchandise
#>   <date>         <int>                  <int> <int>    <int>       <int>
#> 1 1987-09-03         1                      1     1        1           1
#> 2 1987-09-26         1                      1     1        0           1
#> 3 1987-10-07         1                      1     0        1           1
#> 4 1987-10-16         1                      0     0        1           1
#> 5 1987-11-25         1                      0     1        1           1
#> 6 1987-12-03         1                      1     1        0           1
#> # ℹ 128 more variables: `song #1` <int>, `the word` <int>,
#> #   `turn off your guns` <int>, `waiting room` <int>, `and the same` <int>,
#> #   `interlude 1` <int>, `interlude 2` <int>, `interlude 3` <int>,
#> #   `interlude 4` <int>, outro <int>, `interlude 5` <int>, `interlude 6` <int>,
#> #   kyeo <int>, `bad mouth` <int>, `break-in` <int>, `opening remarks` <int>,
#> #   encore <int>, lockdown <int>, suggestion <int>, `lock dug` <int>,
#> #   `interlude 7` <int>, burning <int>, `encore 1` <int>, …

Next, let’s transform the variable for each song into a cumulative count of the number of times the song was played.


mydf_wide2 <- mydf_wide

for(colindex in 2:94) {
  
  mydf_wide2[,colindex] <- cumsum(mydf_wide2[,colindex])     
  
}

head(mydf_wide2)
#> # A tibble: 6 × 134
#>   date       furniture `in defense of humans` intro `joe #1` merchandise
#>   <date>         <int>                  <int> <int>    <int>       <int>
#> 1 1987-09-03         1                      1     1        1           1
#> 2 1987-09-26         2                      2     2        1           2
#> 3 1987-10-07         3                      3     2        2           3
#> 4 1987-10-16         4                      3     2        3           4
#> 5 1987-11-25         5                      3     3        4           5
#> 6 1987-12-03         6                      4     4        4           6
#> # ℹ 128 more variables: `song #1` <int>, `the word` <int>,
#> #   `turn off your guns` <int>, `waiting room` <int>, `and the same` <int>,
#> #   `interlude 1` <int>, `interlude 2` <int>, `interlude 3` <int>,
#> #   `interlude 4` <int>, outro <int>, `interlude 5` <int>, `interlude 6` <int>,
#> #   kyeo <int>, `bad mouth` <int>, `break-in` <int>, `opening remarks` <int>,
#> #   encore <int>, lockdown <int>, suggestion <int>, `lock dug` <int>,
#> #   `interlude 7` <int>, burning <int>, `encore 1` <int>, …

Now, we can reformat the data again to make it a long list of song counts that show how the number of times each song was performed built up over time.


mydf_long <- mydf_wide2 %>%
  pivot_longer(!date, names_to = "song", values_to = "count") %>%
  filter(count>0)

head(mydf_long)
#> # A tibble: 6 × 3
#>   date       song                 count
#>   <date>     <chr>                <int>
#> 1 1987-09-03 furniture                1
#> 2 1987-09-03 in defense of humans     1
#> 3 1987-09-03 intro                    1
#> 4 1987-09-03 joe #1                   1
#> 5 1987-09-03 merchandise              1
#> 6 1987-09-03 song #1                  1

Now we are in a position to graph the data so see how the performance counts evolved over time.

Here is a selection of interesting songs to look at. It would be too cluttered to plot all the songs at once!


mydf_long %>%
  filter(song=="furniture" | song=="waiting room" | song=="shut the door" | song=="kyeo" | song=="polish" | song=="steady diet" | song=="smallpox champion" | song=="birthday pony" | song=="break" | song=="nightshop") %>%
  ggplot(aes(date, count, color = song)) +
  geom_line()

Another approach is to group the songs by release.

releases_lookup <- Repeatr1 %>%
  group_by(song, release) %>%
  summarize(count = n()) %>%
  ungroup() %>%
  select(song, release)
#> `summarise()` has grouped output by 'song'. You can override using the
#> `.groups` argument.

mydf_long <- mydf_long %>%
  left_join(releases_lookup)
#> Joining with `by = join_by(song)`

mydf_long %>%
  filter(release=="steady diet of nothing") %>%
  ggplot(aes(date, count, color = song)) +
  geom_line()


cumulative_song_counts <- mydf_long %>%
  select(date, song, release, count)

Returning to the initial question of whether Fugazi picked songs randomly, the short answer is no, it doesn’t look like it. Their choices indicate preferences that became clearer over time, as can be seen in the above graphs. They sometimes stopped playing certain songs for prolonged periods of time. For instance, there were lengthy periods where KYEO and Furniture were not being played at all, and then they were brought back and played again. Other songs such as Polish were dropped and did not return. Some of the decisions of what songs to play were probably spontaneous but spontaneity is not necessarily the same as randomness, although there might be some randomness involved. Fugazi tried a different system for picking songs once but it only lasted about one song!

There is an interactive version of the above graph available here - have fun!

Mapping the Fugazi Live Series

I have tried my best to locate all the shows on an interactive, online map. I started with the information on the Fugazi Live Series website, including any flyers and comments. Google Maps would find some places quickly but others would be more difficult - many of the venues are no longer there. The search was broadened using sites like setlist.fm, which sometimes have addresses for old venues, reading online publications, and reaching out to people who might remember where the venues were located.

To the best of my knowledge, all of the shows have been located to a reasonable degree of accuracy. The results can be found here.

If you see an error that needs correcting, please open an issue on the Repeatr GitHub.

Sound Quality

Here is a summary of the sound quality ratings of shows in the Fugazi Live Series. A few of the shows are on archive.org but not on the main site at this time. The Fort Reno shows include recordings for every level on the scale from Poor to Excellent.

sound_quality_ratings <- shows_data %>% 
  filter(is.na(sound_quality)==FALSE) %>% 
  group_by(sound_quality) %>% 
  summarize(shows = n()) %>% 
  ungroup()

totalshows <- sum(sound_quality_ratings$shows)

totalrow <- as.data.frame(t(c("Total", totalshows)))

names(totalrow) <- c("sound_quality", "shows")

sound_quality_ratings <- as.data.frame(rbind(sound_quality_ratings, totalrow))

sound_quality_ratings <- sound_quality_ratings %>%
  mutate(shows = as.numeric(shows)) %>%
  mutate(index = case_when(
    sound_quality == "Excellent" ~ 1,
    sound_quality == "Very Good" ~ 2,
    sound_quality == "Good" ~ 3,
    sound_quality == "Poor" ~ 4,
    sound_quality == "Total" ~ 5
  )) %>%
  arrange(index) %>%
  relocate(index)

sound_quality_ratings <- sound_quality_ratings %>%
  mutate(percentage = round(100*(shows / totalshows),1))

sound_quality_ratings
#>   index sound_quality shows percentage
#> 1     1     Excellent    47        5.2
#> 2     2     Very Good   443       49.3
#> 3     3          Good   344       38.3
#> 4     4          Poor    64        7.1
#> 5     5         Total   898      100.0

Three Repeats, but Only One Two for Tuesdays

There is a small number of shows where the same song was performed twice, for different reasons. Some are easier to find than others and there are a few false positives. Let’s have a look, shall we?

two_for_tuesdays <- Repeatr1 %>% 
  filter(tracktype==1) %>% 
  group_by(date, gid, song) %>% 
  summarize(count = n()) %>% 
  ungroup() %>% 
  filter(count>=2)
#> `summarise()` has grouped output by 'date', 'gid'. You can override using the
#> `.groups` argument.

two_for_tuesdays
#> # A tibble: 6 × 4
#>   date       gid                       song                 count
#>   <date>     <chr>                     <chr>                <int>
#> 1 1988-02-06 annapolis-md-usa-20688    break-in                 2
#> 2 1993-11-17 canberra-australia-111793 reclamation              2
#> 3 1995-10-09 peoria-il-usa-100995      bed for the scraping     2
#> 4 1998-05-11 richmond-va-usa-51198     great cop                2
#> 5 1998-07-31 washington-dc-usa-73198   closed captioned         2
#> 6 1998-07-31 washington-dc-usa-73198   foreman's dog            2

‘Break In’ was played twice in Annapolis in 1988 for musical reasons - Ian broke a string the first time through. Neither ‘Reclamation’ nor ‘Bed for the Scraping’ are true duplicates - in both cases the song was so badly interrupted the interruption was treated as a separate track splitting the song in two MP3 tracks. ‘Great Cop’ was actually played twice due to a conflict with security staff at the 1998 gig in Richmond, Virginia. The 1998 Sanctuary Theater show in Washington DC features two versions of both ‘Closed Captioned’ and ‘Foreman’s Dog’ but this is because two songs from the soundcheck are included in the MP3 download as a bonus.

The real “Two for Tuesdays” was at a 1991 gig in Birmingham, Alabama, where the show notes comment “Featuring the one-time attempt of our ‘Two for Tuesday’ gag. No one appeared to notice, so we shelved the idea.” On that occasion, the song “Greed” was played twice, and the gag was announced immediately after the double performance: “Two for Tuesday, Two for Tuesday”. Perhaps this was a little too subtle? ‘Greed’ is such a short song you can fit two renditions into the time it would take to play a normal song. I was specifically looking for the shows where the same song was played twice and this one didn’t come up on the above list because ‘Greed’ followed by ‘Greed’ was treated as one song in the track listing and one MP3 track - perhaps because the person who mastered the recording didn’t notice either, or perhaps they wanted the gag to live on in the digital realm? I also missed it the first time I listened to the show. I bet more people would have noticed if it had been ‘Suggestion’!

Thanks.