Web Scraping with lynx + cURL

Wanted to grab data from a webpage and process the results. In this case the results are a csv file dump

Get the utilities to be used here

sudo apt-get install curl lynx cargo
cargo install xsv;

Setup XSV with Cargo

export PATH="$HOME/.cargo/bin:$PATH"

Copy the contents of the WebPage to local html file

curl -s https://support.spatialkey.com/spatialkey-sample-csv-data/ -o sample.html

Process the html into text (for extraction)

lynx -dump sample.html > sample.txt

Extract the the urls from the webpage

cat sample.txt | grep -e '.csv$' | cut -c 7- > urls.txt

Take the urls and run cURL over it again to extract the data links and save to a csv

head -n 1 urls.txt | xargs curl -so realestate.csv

View the extracted and formatted CSV

xsv table realestate.csv

Full script

curl -s https://support.spatialkey.com/spatialkey-sample-csv-data/ -o sample.html;
lynx -dump sample.html > sample.txt;
cat sample.txt | grep -e '\.csv$' | cut -c 7- > urls.txt;
head -n 1 urls.txt | xargs curl -so realestate.csv;
xsv table realestate.csv;