Page 3 of 3

Re: HTTrack project 20GB

Posted: Sat Aug 18, 2012 5:08 am
by llarson
LexieTheFox wrote:
VirtLands wrote:The following attachment is a sorted list of member names, ID's, emails,
compiled from 241 member webpages. Click on the attached download.

Enjoy. Image Image Image
Ok um... I'm not ok with my ID or Email being given out on that mirror website. Please, when its up. Display my name ONLY.
Agreed :| Except only showing my username.

Re: HTTrack project 20GB

Posted: Sat Aug 18, 2012 5:22 am
by tyteen4a03
LexieTheFox wrote:
VirtLands wrote:The following attachment is a sorted list of member names, ID's, emails,
compiled from 241 member webpages. Click on the attached download.

Enjoy. Image Image Image
Ok um... I'm not ok with my ID or Email being given out on that mirror website. Please, when its up. Display my name ONLY.
Firstly, the informations exposed by VirtLands has nothing to do with me. Secondly, if you choose to publicly display your email address the bot will grab them, but they will never be shown in the public. If you did not, the bot will not attempt to grab them.

(For anybody that is worried about the exposure of emails in the Off-Topic area - the bot is never configured to access the Off-Topic area in any way, nor is there code to scrap information only accessible by a moderator)

Posted: Sat Aug 18, 2012 2:07 pm
by Wonderman109
I don't want anything but my username being given away either, please. :| :!: :o

Posted: Sat Aug 18, 2012 3:30 pm
by jdl
Guys, your ID/Email/other personal stuff isn't going to be given away. :?

Posted: Sat Aug 18, 2012 4:59 pm
by tyteen4a03
Wonderman109 wrote:I don't want anything but my username being given away either, please. :| :!: :o
If you did not give away your personal information publicly, the bot won't be able to get them.

emails list

Posted: Sat Aug 18, 2012 5:58 pm
by VirtLands
I have removed the PCPuzzle Member's List.TXT attachment from page 1.

Wonderman109 wrote" Ị̤̣ d̤̣̣̣̣o̤̤̣̤n̤̤̤̤̤̣'̤̣̣̤̣̣ṭ̣̣̤̣̤ ẉ̤̤̤̤̣ạ̣̤n̤̣̣̣t̤̤̣̤̤ ạ̤̣̣̤ṇ̣̤̣y̤̤̣̣t̤̣̣h̤̣̤i̤̤̤̤ṇ̤̣̣̤g̣̣̤̤̤̣ b̤̣̤̤̣̤ṳ̤̣̤̣̤t̤̤̤̤̣̤ m̤̣̣̣y̤̣̤ ṳ̤̣̣s̤̤̤̣ẹ̤̣̣̣ṛ̣̣̤̣n̤̤̣̣̣̣ạ̣̤̤̤̤m̤̤̣e̤̤̣ ḅ̤̣̣̤ẹ̣̣̤̤ị̣̤̣ṇ̣̤̤̤g̤̤̤̣̤̤ g̤̤̤̣i̤̣̣ṿ̣̤ẹ̣̣n̤̣̤̤̤ a̤̤̣̤̣ẉ̤̣ạ̣̣̤y̤̤̣̤̣̣ ẹ̣̤ị̣̣t̤̤̣̣ḥ̣̣̤ẹ̤̤̤r̤̤̤̣̣,̤̣̣̣̤̤ p̣̤̣̣̤̣l̤̤̤̣ẹ̣̣̤̣̤a̤̤̣̣̣s̤̤̤̣̤e̤̤̤̤̤̤.̣̣̣ Image


Image Image Image

Posted: Sat Aug 18, 2012 7:08 pm
by tyteen4a03
That is what I mean by "public information".

EDIT: I am happy to announce that the spider is finished. Now I am testing the spider then moving on to the Pipeline (the medium that stores things into database). The trickiest part of the Pipeline would be the html2bbcode function, which turns raw HTML back to BBcode.

Posted: Wed Aug 22, 2012 10:37 am
by tyteen4a03
Debugging took more time than I thought it would. It seems like I understood some of the concepts wrongly, so now I have to redo some bits of scraping.

(And as part of the test, I now have almost everything shown to the public)

Posted: Thu Aug 23, 2012 5:45 am
by VirtLands
Great progresss.

I would be glad to help in it except that I don't understand any of
that python-ish stuff.
I have Python 3.2 installed, but I never use it.

Send us a screenshot, or data sample. ;)

Posted: Tue Sep 04, 2012 5:01 am
by tyteen4a03
Nothing much has happened over the time, but the scraping bot is coming into shape. Chris (christ) has offered to work together for the website, his experience in PHP scraping will definitely come in handy.

I now have school and have to apply to university (eek), so I won't be able to work on this as much as I could in the summer holidays. Hopefully I have the willpower to keep this project alive... :eek:

os.fork

Posted: Tue Sep 04, 2012 5:29 am
by VirtLands
Wherever you are, may the os.fork be with you.

Looks likfe I'll have to pick up where he (Tyteen) left off.

This isn't going to be easy. Image

While you're in university you can help us, Image right?

Posted: Tue Sep 04, 2012 9:34 am
by tyteen4a03
I'm aiming for MIT, and you know how people there like to work day after night after day.

So I'm not sure.

MIT

Posted: Tue Sep 04, 2012 7:01 pm
by VirtLands
I see. Congratulations on this plan.

Posted: Mon Sep 17, 2012 6:46 pm
by tyteen4a03
Development has halted because I had a bit of school work and unexpected projects coming in.

It will restart in... *insert restart time here*

Posted: Tue Nov 20, 2012 2:15 am
by tyteen4a03
Because the development isn't going anywhere, I'm pushing the code on GitHub for everybody to see. Hopefully I'll have more time in Christmas break.

Note: This code is broken.

https://github.com/tyteen4a03/wlf_scrapy

Note: This code is broken

Posted: Tue Nov 20, 2012 8:27 pm
by VirtLands
tyteen4a03 wrote:...Note: This code is broken....
So, is it broken as in incomplete, or broken as in shattered ? Image
---------------------------------------------------

Well, this part was interesting anyway: Image
sample code from TyTeen's items.py:

class User(Item):
"""
A user.
"""
userID = Field()
username = Field()
joinDate = Field()
totalPosts = Field()
avatarName = Field()
location = Field()
website = Field()
occupation = Field()
interests = Field()
email = Field()
msn = Field()
aim = Field()
yahoo = Field()
icq = Field()
signature = Field()

Rest in peace, code... Image

Posted: Tue Nov 20, 2012 9:28 pm
by tyteen4a03
Incomplete. Sorry if I wasn't clear enough :P

A major reason why it's incomplete comes from pipeline.py. I haven't finished the htmlToBBCode method yet (and it's hacky, I tell ya). Otherwise, the code runs (with a lot of exceptions yet to be fixed)

Posted: Fri Jun 07, 2013 4:00 pm
by jdl
So is this still a thing, Tyteen?

How's everything been doing the past few months? :)

Posted: Mon Jun 24, 2013 8:56 pm
by myuacc1studios
Technos72 wrote:My answer to good idea:
Image
AGH TOO MUCH PONY AHHHH

Posted: Sat Jul 20, 2013 3:33 pm
by tyteen4a03
jdl wrote:So is this still a thing, Tyteen?

How's everything been doing the past few months? :)
Nothing has really happened in the past few months, I had freelance projects writing XenForo addons, exams and a very big video project (think full-length movies). Now I'm finishing a music video that is very important to me, then I'm moving to Denmark for a year of foreign exchange.

My coding skills has grew a lot since I started developing XenForo addons, and I can see the new Wonderland Archive being based on XenForo's Resource Manager addon. It will (as always) be a lot, lot more easier if Patrick would switch to XenForo and give me direct access to the database for post conversion, but it's not happening soon, if ever.

This project will be out of hiatus somewhat soon, but this project is low on my priority list - I need to start coding for profit so I can make a living. As for when the hiatus will be over, I honestly don't know. Feel free to take the current WLF Scraping code and expand it into a full bot and scrap the forum - with data, this will all be a piece of cake.

Posted: Sat Jul 20, 2013 4:11 pm
by Muzozavr
tyteen4a03 wrote:a very big video project (think full-length movies
*twitch*

As a wannabe filmmaker (who doesn't have a camera nor the money for it, nor have I written a finished script) I'm highly interested in knowing what the project is and what your role in it was.

Posted: Sat Jul 20, 2013 4:39 pm
by tyteen4a03
Muzozavr wrote:
tyteen4a03 wrote:a very big video project (think full-length movies
*twitch*

As a wannabe filmmaker (who doesn't have a camera nor the money for it, nor have I written a finished script) I'm highly interested in knowing what the project is and what your role in it was.
Everything. It's not even a movie movie - just a lot of photos plus heaps of funny content of my 100 friends at school put together in a Ponies - The Anthology way. It seems to be a success - my friends enjoyed it a lot.

However, my Music Video coming up will be a real professional production - stay tuned. :)

Posted: Sun Jul 21, 2013 5:06 pm
by tyteen4a03
A lot has happened in the past 48 hours and I'm sad to announce the MV will need to be postponed at least a year later. However, this will mean I will have a little bit more free time working on this.

I've shot MS an offer of my spare XenForo license - now patiently waiting for a response. If MS gives switching to XenForo a green light, I'll bump the priority of this project, and incorporate an API for future Wonderland games to directly upload levels and adventures. If not, I'll continue my work on the scraping bot - it's quite near completion as I plan on not scraping some very outdated data (like IM screen names - who uses any of the listed IMs in the Profile anymore?) and slimming down the functionality (like built-in html to BBcode conversion - these HTML mess I can fix by hand).