Global PowerPlant Database - Extraction

Wanted to get the latest list of all the electrical powerplants in the USA and thanks to the World Resources Institutes Global PowerPlant Database source.  They do a great service in collecting and collating the various datasets available into a single updated source and all their methods and code is open sourced on GitHub

First get some helper tools

sudo apt-get install cargo csvkit busybox lynx;
cargo install xsv;
  • Line 1: sudo apt-get install cargo csvkit busybox; Gets the Rust package manager cargo, a nice csv manipulator tool csvkit and the packaged utility set busybox and finally get the excellent original text browser lynx to extract and manipulate html
  • Line 2: Install the xsv tool from the Rust cargo package registry

Add this line to your .bashrc file if you don't have it already for Rust cargo binaries downloaded

export PATH="$HOME/.cargo/bin:$PATH"

Now I can download, extract, and convert

wget -qO- $(lynx -listonly -nonumbers -dump http://datasets.wri.org/dataset | grep "globalpowerplantdatabasev") | busybox unzip -;
rm -v *.txt *.pdf;
xsv search -s country 'USA' global_power_plant_database.csv | xsv select name,capacity_mw,latitude,longitude,primary_fuel | xsv slice -o USA.csv
csvsql --db sqlite:///USA.sqlite --tables powerplants --insert USA.csv;
rm *.csv;
  • Line 1: wget -qO- $(lynx -listonly -nonumbers -dump http://datasets.wri.org/dataset | grep "globalpowerplantdatabasev") | busybox unzip -; Find the dataset link with lynx, then get the zip file, and finally extract it  Using the wget -q quiet and O- Output file to stream options here instead of file
  • Line 2: rm -v *.txt *.pdf; Delete the unnecessary files
  • Line 3: xsv search -s country 'USA' global_power_plant_database.csv | xsv select name,capacity_mw,latitude,longitude,primary_fuel | xsv slice -o USA.csv
    csvsql --db sqlite:///USA.sqlite --tables powerplants --insert USA.csv;
     Extract the columns from the globalpowerplantsdabase name,capacity_mw,latitude,longitude,primary_fuel and save as USA.csv
  • Line 4: csvsql --db sqlite:///USA.sqlite --tables powerplants --insert USA.csv; Convert the USA.csv file into a sqlite database called USA.sqlite
  • Line 5: rm *.csv; Cleanup the *.csv files

Complete script

sudo apt-get install cargo csvkit busybox;
cargo install xsv;
PATH=$PATH:~/.cargo/bin;
wget -qO- $(lynx -listonly -nonumbers -dump http://datasets.wri.org/dataset | grep "globalpowerplantdatabasev") | busybox unzip -;
rm -v *.txt *.pdf;
xsv search -s country 'USA' global_power_plant_database.csv | xsv select name,capacity_mw,latitude,longitude,primary_fuel | xsv slice -o USA.csv
csvsql --db sqlite:///USA.sqlite --tables powerplants --insert USA.csv;
rm *.csv;