Global PowerPlant Database - Extraction
Wanted to get the latest list of all the electrical powerplants in the USA
and thanks to the World Resources Institutes Global PowerPlant Database source. They do a great service in collecting and collating the various datasets available into a single updated source and all their methods and code is open sourced on GitHub
First get some helper tools
sudo apt-get install cargo csvkit busybox lynx;
cargo install xsv;
- Line 1:
sudo apt-get install cargo csvkit busybox;
Gets the Rust package manager cargo, a nice csv manipulator tool csvkit and the packaged utility set busybox and finally get the excellent original text browser lynx to extract and manipulate html - Line 2: Install the xsv tool from the Rust cargo package registry
Add this line to your .bashrc file if you don't have it already for Rust cargo binaries downloaded
export PATH="$HOME/.cargo/bin:$PATH"
Now I can download, extract, and convert
wget -qO- $(lynx -listonly -nonumbers -dump http://datasets.wri.org/dataset | grep "globalpowerplantdatabasev") | busybox unzip -;
rm -v *.txt *.pdf;
xsv search -s country 'USA' global_power_plant_database.csv | xsv select name,capacity_mw,latitude,longitude,primary_fuel | xsv slice -o USA.csv
csvsql --db sqlite:///USA.sqlite --tables powerplants --insert USA.csv;
rm *.csv;
- Line 1:
wget -qO- $(lynx -listonly -nonumbers -dump http://datasets.wri.org/dataset | grep "globalpowerplantdatabasev") | busybox unzip -;
Find the dataset link with lynx, then get the zip file, and finally extract it Using the wget -q quiet and O- Output file to stream options here instead of file - Line 2:
rm -v *.txt *.pdf;
Delete the unnecessary files - Line 3:
xsv search -s country 'USA' global_power_plant_database.csv | xsv select name,capacity_mw,latitude,longitude,primary_fuel | xsv slice -o USA.csv
Extract the columns from the globalpowerplantsdabase name,capacity_mw,latitude,longitude,primary_fuel and save as
csvsql --db sqlite:///USA.sqlite --tables powerplants --insert USA.csv;USA.csv
- Line 4:
csvsql --db sqlite:///USA.sqlite --tables powerplants --insert USA.csv;
Convert theUSA.csv
file into a sqlite database calledUSA.sqlite
- Line 5:
rm *.csv;
Cleanup the*.csv
files
Complete script
sudo apt-get install cargo csvkit busybox;
cargo install xsv;
PATH=$PATH:~/.cargo/bin;
wget -qO- $(lynx -listonly -nonumbers -dump http://datasets.wri.org/dataset | grep "globalpowerplantdatabasev") | busybox unzip -;
rm -v *.txt *.pdf;
xsv search -s country 'USA' global_power_plant_database.csv | xsv select name,capacity_mw,latitude,longitude,primary_fuel | xsv slice -o USA.csv
csvsql --db sqlite:///USA.sqlite --tables powerplants --insert USA.csv;
rm *.csv;