Interact with Sciencebase.gov using sbtools: Finding Dam Removal Data

I was searching for data on US dam removal trends and stumbled upon a sciencebase.gov page with data that looked promising. However, I wasn’t sure if this was the best source of data given the mixed results of my earlier search engine queries. I decided to try out the sbtools package as a way of seeing all the files associated with multiple dam removal entries in Sciencebase. I started by passing a query to the query_sb_text function. Although the function returns a list of twenty items by default, I only show the first two to illustrate how it is not obvious which entry is best for my purposes.

library(sbtools)
query_sb_text("Dam Removal Database")[1:2]
## [[1]]
## <ScienceBase Item> 
##   Title: National Dam Removal Database: A living database for information on dying dams
##   Creator/LastUpdatedBy:      / 
##   Provenance (Created / Updated):   / 
##   Children: 
##   Item ID: 552448fce4b027f0aee3d3d4
##   Parent ID: 550c9621e4b02e76d759d7a2
## 
## [[2]]
## <ScienceBase Item> 
##   Title: USGS Dam Removal Science Database
##   Creator/LastUpdatedBy:      / 
##   Provenance (Created / Updated):   / 
##   Children: 
##   Item ID: 55071bf9e4b02e76d757c076
##   Parent ID: 53f630a3e4b09d12e0e9bd1e

Next, I found the subset of the results that sounded most promising and looped over it with item_list_files.

query_names <- c("American Rivers Dam Removals",
                 "Dam Removal Information Portal (DRIP)",
                 "National Dam Removal Science Database",
                 "USGS Dam Removal Science Database",
                 "National Dam Removal Database: A living database for information on dying dams")
res_ids <- lapply(query_names, function(x) query_sb_text(x)[[1]]$id)
lapply(res_ids, function(x) item_list_files(x[[1]]))
## [[1]]
## data frame with 0 columns and 0 rows
## 
## [[2]]
##                             fname size
## 1 metadata2365201133950090499.xml 3840
##                                                                                                                                       url
## 1 https://www.sciencebase.gov/catalog/file/get/57b6ce28e4b03fd6b7d919cf?f=__disk__15%2Fac%2F72%2F15ac724223c8464c77e8615543dc0c9a07411d00
## 
## [[3]]
## data frame with 0 columns and 0 rows
## 
## [[4]]
##                                                      fname    size
## 1                        20150309_DamRemovalDatabase.accdb 1196032
## 2        20150309_DamRemovalDatabase_Accession Numbers.xml   62542
## 3                20150309_DamRemovalDatabase_Citations.xml  106793
## 4                     20150309_DamRemovalDatabase_Dams.xml  178601
## 5        20150309_DamRemovalDatabase_Study Design Data.xml  322143
## 6            20150309_DamRemovalDatabase_Study Results.xml  373008
## 7           USGS_Dam_Removal_Science_Database_METADATA.xml   77789
## 8                      20150527_Dam Removal Database.accdb 1302528
## 9      20150527_Dam Removal Database_Accession Numbers.xml   62542
## 10             20150527_Dam Removal Database_Citations.xml  107128
## 11                  20150527_Dam Removal Database_Dams.xml  178958
## 12     20150527_Dam Removal Database_Study Design Data.xml  322143
## 13          20150527_Dam Removal DatabaseStudy Results.xml  373008
## 14                     20150710_Dam_Removal_Database.accdb 1777664
## 15     20150710_Dam_Removal_Database_Accession_Numbers.xml   69881
## 16             20150710_Dam_Removal_Database_Citations.xml  107128
## 17                  20150710_Dam_Removal_Database_Dams.xml  195634
## 18     20150710_Dam_Removal_Database_Study_Design_Data.xml  354367
## 19         20150710_Dam_Removal_Database_Study_Results.xml  373008
## 20 20150710_USGS_Dam_Removal_Science_Database_metadata.xml   76303
##                                                                                                                                        url
## 1  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__43%2Ffb%2F81%2F43fb810a41ebeb792d5f1dad87809e35b8a79354
## 2  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__93%2F76%2Fef%2F9376ef25e9e1e9f70420448ed98d50a66660c0bd
## 3  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__43%2F3a%2Fb5%2F433ab51df501fb9065584e22a8459a5abe4ad52c
## 4  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__54%2Ffc%2Fae%2F54fcaea6e89924e24cedb15bb6d31aac13f41d23
## 5  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__59%2Fd0%2Fb8%2F59d0b8df94974951b02fe7081b56e1ad5086123f
## 6  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__1a%2F2d%2F30%2F1a2d307c3286790c7818fceca9972cca37466891
## 7  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__86%2Fc1%2F41%2F86c1410e5197a5da45a7c2cffb516c99fddb1407
## 8  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__5f%2Fda%2Fc7%2F5fdac70a07b9908c9e23acf313113f0fff9266e7
## 9  https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__b7%2F88%2F4b%2Fb7884b9f771b65ac655a88d27874b8511534efec
## 10 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__15%2F64%2Fea%2F1564ead29d2c5228a19de8f63d32c36d960f19ba
## 11 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__14%2F1c%2F4c%2F141c4cab303202778b4a5b4ee1a21711a522eed4
## 12 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__0b%2F63%2F0b%2F0b630b5a4f5e924ee447879f3b87786dbc9368cf
## 13 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__6b%2F82%2Fb2%2F6b82b20c4c64fbb96473f29dd6c8988f0038b138
## 14 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__0d%2Fc5%2F78%2F0dc578f5dfd8c03968e1847ce15846bed89c0635
## 15 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__d5%2F4e%2F7e%2Fd54e7eb3c8d04b419d71761f3c7bbc99076183d9
## 16 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__f7%2F7b%2Fd4%2Ff77bd4e124f85e8689ca11a99112ce3f7877964f
## 17 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__fc%2F1c%2Fa3%2Ffc1ca3b96366654f311d3b2abf1da3fdff35b61b
## 18 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__5d%2F66%2F2e%2F5d662ede94fb288e9291672f1e9838b9dff1f2e7
## 19 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__65%2F20%2F30%2F652030ca51fc3d33b5f8f052c8afb14f5fa5a5db
## 20 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__6a%2F1f%2Fa8%2F6a1fa8987c5773a9aca05fdb74de1b6b3beab674
## 
## [[5]]
##          fname   size
## 1     Dams.PNG 187555
## 2 figure13.PNG 424553
## 3 figure14.png 169371
##                                                                                                                                       url
## 1 https://www.sciencebase.gov/catalog/file/get/552448fce4b027f0aee3d3d4?f=__disk__35%2F0d%2F58%2F350d58bf13bd184f88f24e5f61c92ab76cc4c577
## 2 https://www.sciencebase.gov/catalog/file/get/552448fce4b027f0aee3d3d4?f=__disk__e3%2F66%2F6b%2Fe3666bf48aa91fbad53ac3813dffcff0d6922888
## 3 https://www.sciencebase.gov/catalog/file/get/552448fce4b027f0aee3d3d4?f=__disk__87%2Fb1%2Fa1%2F87b1a16e3f65d765bdea0c00a2b9f32173704a77

It looks like only the fourth term is associated with any available data files. Next, I narrowed my query to this term and used the item_file_download function to return the largest file which I will assume is the actual database.

query_names <- c("USGS Dam Removal Science Database")
query_ids   <- query_sb_text(query_names)[[1]]$id
query_files <- item_list_files(query_ids)
big_file <- query_files[order(query_files$size, decreasing = TRUE),][1,]

item_file_download(query_ids, names = big_file$fname,
                   destinations = big_file$fname, dest_dir = ".", overwrite_file = TRUE)
## [1] "20150710_Dam_Removal_Database.accdb"

Since I am on a Linux machine without access to MS Access this will make it a bit difficult to read the Access database file. Luckily I can use mdbtools command line program (it can be installed with apt-get). First, I identified the Dams table using the mdb-tables command. Finally, I pipe the output of mdb-export to read.csv.

system(paste0("mdb-tables ", big_file$fname))
res <- read.csv(pipe(paste0("mdb-export ", big_file$fname, " Dams")))
head(res[,1:8])
##   DamAccessionNumber          DamName    DamRiverName
## 1                  1 Murphy Creek Dam    Murphy Creek
## 2                  2       Croton Dam       Big Creek
## 3                  3  Hawkesville Dam Conestoga River
## 4                  4  Huttonville Dam    Credit River
## 5                  5     Chilligo Dam     Ellis Creek
## 6                  6   Greenfield Dam      Nith River
##                      DamLocation DamState_Province DamCountry DamLatitude
## 1 San Joaquin County, California                CA        USA    38.23394
## 2         Delhi, Ontario, Canada           Ontario     Canada    42.82025
## 3   Hawkesville, Ontario, Canada           Ontario     Canada    43.56758
## 4   Huttonville, Ontario, Canada           Ontario     Canada    43.64485
## 5     Cambridge, Ontario, Canada           Ontario     Canada    43.43619
## 6    Greenfield, Ontario, Canada           Ontario     Canada    43.29834
##   DamLongitude
## 1   -121.02740
## 2    -80.50807
## 3    -80.63648
## 4    -79.80412
## 5    -80.33448
## 6    -80.47440