I was searching for data on US dam removal trends and stumbled upon a sciencebase.gov page with data that looked promising. However, I wasn’t sure if this was the best source of data given the mixed results of my earlier search engine queries. I decided to try out the sbtools
package as a way of seeing all the files associated with multiple dam removal entries in Sciencebase. I started by passing a query to the query_sb_text
function. Although the function returns a list of twenty items by default, I only show the first two to illustrate how it is not obvious which entry is best for my purposes.
library(sbtools)
query_sb_text("Dam Removal Database")[1:2]
## [[1]]
## <ScienceBase Item>
## Title: National Dam Removal Database: A living database for information on dying dams
## Creator/LastUpdatedBy: /
## Provenance (Created / Updated): /
## Children:
## Item ID: 552448fce4b027f0aee3d3d4
## Parent ID: 550c9621e4b02e76d759d7a2
##
## [[2]]
## <ScienceBase Item>
## Title: USGS Dam Removal Science Database
## Creator/LastUpdatedBy: /
## Provenance (Created / Updated): /
## Children:
## Item ID: 55071bf9e4b02e76d757c076
## Parent ID: 53f630a3e4b09d12e0e9bd1e
Next, I found the subset of the results that sounded most promising and looped over it with item_list_files
.
query_names <- c("American Rivers Dam Removals",
"Dam Removal Information Portal (DRIP)",
"National Dam Removal Science Database",
"USGS Dam Removal Science Database",
"National Dam Removal Database: A living database for information on dying dams")
res_ids <- lapply(query_names, function(x) query_sb_text(x)[[1]]$id)
lapply(res_ids, function(x) item_list_files(x[[1]]))
## [[1]]
## data frame with 0 columns and 0 rows
##
## [[2]]
## fname size
## 1 metadata2365201133950090499.xml 3840
## url
## 1 https://www.sciencebase.gov/catalog/file/get/57b6ce28e4b03fd6b7d919cf?f=__disk__15%2Fac%2F72%2F15ac724223c8464c77e8615543dc0c9a07411d00
##
## [[3]]
## data frame with 0 columns and 0 rows
##
## [[4]]
## fname size
## 1 20150309_DamRemovalDatabase.accdb 1196032
## 2 20150309_DamRemovalDatabase_Accession Numbers.xml 62542
## 3 20150309_DamRemovalDatabase_Citations.xml 106793
## 4 20150309_DamRemovalDatabase_Dams.xml 178601
## 5 20150309_DamRemovalDatabase_Study Design Data.xml 322143
## 6 20150309_DamRemovalDatabase_Study Results.xml 373008
## 7 USGS_Dam_Removal_Science_Database_METADATA.xml 77789
## 8 20150527_Dam Removal Database.accdb 1302528
## 9 20150527_Dam Removal Database_Accession Numbers.xml 62542
## 10 20150527_Dam Removal Database_Citations.xml 107128
## 11 20150527_Dam Removal Database_Dams.xml 178958
## 12 20150527_Dam Removal Database_Study Design Data.xml 322143
## 13 20150527_Dam Removal DatabaseStudy Results.xml 373008
## 14 20150710_Dam_Removal_Database.accdb 1777664
## 15 20150710_Dam_Removal_Database_Accession_Numbers.xml 69881
## 16 20150710_Dam_Removal_Database_Citations.xml 107128
## 17 20150710_Dam_Removal_Database_Dams.xml 195634
## 18 20150710_Dam_Removal_Database_Study_Design_Data.xml 354367
## 19 20150710_Dam_Removal_Database_Study_Results.xml 373008
## 20 20150710_USGS_Dam_Removal_Science_Database_metadata.xml 76303
## url
## 1 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__43%2Ffb%2F81%2F43fb810a41ebeb792d5f1dad87809e35b8a79354
## 2 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__93%2F76%2Fef%2F9376ef25e9e1e9f70420448ed98d50a66660c0bd
## 3 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__43%2F3a%2Fb5%2F433ab51df501fb9065584e22a8459a5abe4ad52c
## 4 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__54%2Ffc%2Fae%2F54fcaea6e89924e24cedb15bb6d31aac13f41d23
## 5 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__59%2Fd0%2Fb8%2F59d0b8df94974951b02fe7081b56e1ad5086123f
## 6 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__1a%2F2d%2F30%2F1a2d307c3286790c7818fceca9972cca37466891
## 7 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__86%2Fc1%2F41%2F86c1410e5197a5da45a7c2cffb516c99fddb1407
## 8 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__5f%2Fda%2Fc7%2F5fdac70a07b9908c9e23acf313113f0fff9266e7
## 9 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__b7%2F88%2F4b%2Fb7884b9f771b65ac655a88d27874b8511534efec
## 10 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__15%2F64%2Fea%2F1564ead29d2c5228a19de8f63d32c36d960f19ba
## 11 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__14%2F1c%2F4c%2F141c4cab303202778b4a5b4ee1a21711a522eed4
## 12 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__0b%2F63%2F0b%2F0b630b5a4f5e924ee447879f3b87786dbc9368cf
## 13 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__6b%2F82%2Fb2%2F6b82b20c4c64fbb96473f29dd6c8988f0038b138
## 14 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__0d%2Fc5%2F78%2F0dc578f5dfd8c03968e1847ce15846bed89c0635
## 15 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__d5%2F4e%2F7e%2Fd54e7eb3c8d04b419d71761f3c7bbc99076183d9
## 16 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__f7%2F7b%2Fd4%2Ff77bd4e124f85e8689ca11a99112ce3f7877964f
## 17 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__fc%2F1c%2Fa3%2Ffc1ca3b96366654f311d3b2abf1da3fdff35b61b
## 18 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__5d%2F66%2F2e%2F5d662ede94fb288e9291672f1e9838b9dff1f2e7
## 19 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__65%2F20%2F30%2F652030ca51fc3d33b5f8f052c8afb14f5fa5a5db
## 20 https://www.sciencebase.gov/catalog/file/get/55071bf9e4b02e76d757c076?f=__disk__6a%2F1f%2Fa8%2F6a1fa8987c5773a9aca05fdb74de1b6b3beab674
##
## [[5]]
## fname size
## 1 Dams.PNG 187555
## 2 figure13.PNG 424553
## 3 figure14.png 169371
## url
## 1 https://www.sciencebase.gov/catalog/file/get/552448fce4b027f0aee3d3d4?f=__disk__35%2F0d%2F58%2F350d58bf13bd184f88f24e5f61c92ab76cc4c577
## 2 https://www.sciencebase.gov/catalog/file/get/552448fce4b027f0aee3d3d4?f=__disk__e3%2F66%2F6b%2Fe3666bf48aa91fbad53ac3813dffcff0d6922888
## 3 https://www.sciencebase.gov/catalog/file/get/552448fce4b027f0aee3d3d4?f=__disk__87%2Fb1%2Fa1%2F87b1a16e3f65d765bdea0c00a2b9f32173704a77
It looks like only the fourth term is associated with any available data files. Next, I narrowed my query to this term and used the item_file_download
function to return the largest file which I will assume is the actual database.
query_names <- c("USGS Dam Removal Science Database")
query_ids <- query_sb_text(query_names)[[1]]$id
query_files <- item_list_files(query_ids)
big_file <- query_files[order(query_files$size, decreasing = TRUE),][1,]
item_file_download(query_ids, names = big_file$fname,
destinations = big_file$fname, dest_dir = ".", overwrite_file = TRUE)
## [1] "20150710_Dam_Removal_Database.accdb"
Since I am on a Linux machine without access to MS Access this will make it a bit difficult to read the Access database file. Luckily I can use mdbtools command line program (it can be installed with apt-get
). First, I identified the Dams
table using the mdb-tables
command. Finally, I pipe
the output of mdb-export
to read.csv
.
system(paste0("mdb-tables ", big_file$fname))
res <- read.csv(pipe(paste0("mdb-export ", big_file$fname, " Dams")))
head(res[,1:8])
## DamAccessionNumber DamName DamRiverName
## 1 1 Murphy Creek Dam Murphy Creek
## 2 2 Croton Dam Big Creek
## 3 3 Hawkesville Dam Conestoga River
## 4 4 Huttonville Dam Credit River
## 5 5 Chilligo Dam Ellis Creek
## 6 6 Greenfield Dam Nith River
## DamLocation DamState_Province DamCountry DamLatitude
## 1 San Joaquin County, California CA USA 38.23394
## 2 Delhi, Ontario, Canada Ontario Canada 42.82025
## 3 Hawkesville, Ontario, Canada Ontario Canada 43.56758
## 4 Huttonville, Ontario, Canada Ontario Canada 43.64485
## 5 Cambridge, Ontario, Canada Ontario Canada 43.43619
## 6 Greenfield, Ontario, Canada Ontario Canada 43.29834
## DamLongitude
## 1 -121.02740
## 2 -80.50807
## 3 -80.63648
## 4 -79.80412
## 5 -80.33448
## 6 -80.47440