# Pastebin RoxTYaHo Scraping Everyboty ================== Everyboty was a network a bots that collected images from 4chan and reddit and aggregated them for users to view, vote, tag, and comment on. However, it is dogged by an absolutely abysmal gallery view system, perhaps it is based on some sort of client side jQuery to REST thing, but it is slow and clunky. There may be some sort of delicious REST API behind it all though. Try to extract out the good from the bad. ## Structure While there are many different components of the Everyboty Network, there is a unified system of IDs, and all pictures can be viewed together on . * **ID** - There were 100061 images obtained before the network stopped saving images. * Infuriatingly, the actual image urls may be numbered differently. This forces us to scrape directly each time we view it. * **base_url** - `http://everyboty.net` * **view_url** - `http://everyboty.net/?perm=100061` * **image_url** - `http://everyboty.net/shared/post_media/images/full_sized/100061.png` ## Parsing the Image View System This has some weird client side jquery shit to generate the view. By mucking around in the Firefox Web Console, I managed to find these javascript files: * [The Layout_home_js File](http://everyboty.net/javascript/layout_home_v3b.js) - Apparently has something to do with uploading images * [the home.js file](http://everyboty.net/javascript/home.js) - Has something to do with displaying images. What we can do is just hack this up to spit out JSON for every image in a loop. ### What is AJAX? One of the innovations in web development technology that made Web 2.0 possible is AJAX (Asynchronous JavaScript and XML). This technology allowed some elements of content to be generated on the fly instead of delivered right then and there. We can exploit this system to bypass their ugly HTML and get the pure XML Data out of it. But how?