Page 1 of 1

new project, need to pull text from a webpage,

Posted: Sun Mar 22, 2009 7:30 am
by riddlebox
I am wanting to create an app that I can pull text from a couple of websites, the text being baseball players, and their rankings for fantasy baseball. I would then like to specify an extra number, my "rating" depending on the website, like -1 if injured or +1 if from cbssportsline.com, and have the app then create a new list of positions and the rankings of players. So I can select pitchers, then it give me the top 10 guys left during a draft
I am sure I can do most of it, but I do not know how to pull the text from the sites I want to use, any ideas?

Re: new project, need to pull text from a webpage,

Posted: Sun Mar 22, 2009 12:56 pm
by hellonorman
I can't tell from your question whether your stuck on how to retrieve a website page or how to parse a text file.

As for parsing the text file you will have to examine it's structure to figure out how to extract the data you are interested in. Once you can extract the pitcher data you could put that in an array or hashtable(does python have hashtables?). Or you could create a class for pitchers which has a rank element.

Getting the text from the website should be pretty well documented. Also once you have the pitcher data there should be plenty of documentation of working with collections of data. As far as finding the pitcher data in the text...that would depend on examining a specific file.

Re: new project, need to pull text from a webpage,

Posted: Wed Mar 25, 2009 11:29 pm
by eddie
A.
wget file(s)
html2text file(s)
simple script (via python or whatever) to extract what you need.

B.
You can grep a website via a url.

Never forget grep, awk, cut, and sed.....
http://www.linuxconfig.org/Fgrep
http://ubuntuforums.org/showthread.php?p=6708426
http://ubuntuforums.org/showthread.php?t=906804
look at bashpodder....

Re: new project, need to pull text from a webpage,

Posted: Mon Mar 30, 2009 9:45 pm
by brian_X7
You might want to also look at an O'Reilly book called Baseball Hacks. The book uses PERL for most of the examples, but they are pretty simple and shouldn't be too difficult to convert over to Python.

Re: new project, need to pull text from a webpage,

Posted: Mon Aug 24, 2009 11:19 am
by jstgtpaid
This sounds interesting... Did you finish this project? What method did you end up using?