Adrian Malacoda
|
3f4eecc238
|
use dateutil to parse rfc3339 datetime strings in <time> elements, if they are present.
|
2016-11-27 01:10:04 -06:00 |
|
Adrian Malacoda
|
c800312423
|
add python-dateutil dep
|
2016-11-27 01:01:52 -06:00 |
|
Adrian Malacoda
|
5bcb6e8884
|
add extra post & user info
|
2016-11-27 00:48:55 -06:00 |
|
Adrian Malacoda
|
71a4f8c5a4
|
add extra user metadata such as title and avatar
|
2016-11-27 00:43:11 -06:00 |
|
Adrian Malacoda
|
de89ddb350
|
Add timestamp to post model
|
2016-11-27 00:42:27 -06:00 |
|
Adrian Malacoda
|
61e25fe9d9
|
example of large thread
|
2016-11-27 00:34:59 -06:00 |
|
Adrian Malacoda
|
c83d4a9916
|
for now, limit to forumer forums (fr.yuku.com) as I'm not sure if this scraper will support non-forumer ones
|
2016-11-27 00:18:39 -06:00 |
|
Adrian Malacoda
|
55176e4596
|
more examples in readme
|
2016-11-27 00:17:04 -06:00 |
|
Adrian Malacoda
|
741573d30a
|
only want first h1/h2 etc
|
2016-11-27 00:16:21 -06:00 |
|
Adrian Malacoda
|
ea46ae8853
|
.text() not text
|
2016-11-27 00:14:16 -06:00 |
|
Adrian Malacoda
|
9c401cbfb1
|
need to use .items() grumble grumble
|
2016-11-27 00:11:42 -06:00 |
|
Adrian Malacoda
|
b304297019
|
fix signature parsing, use html instead of text. Unfortunately there's a lot of garbage here we'll have to clean up
|
2016-11-27 00:03:30 -06:00 |
|
Adrian Malacoda
|
6fb7980218
|
make threads subdir under board so we can put an index.json there with board metadata
|
2016-11-26 23:54:05 -06:00 |
|
Adrian Malacoda
|
eabf099f47
|
fix for yuku's broken postbit markup
|
2016-11-26 23:42:30 -06:00 |
|
Adrian Malacoda
|
f4540d4030
|
expand readme
|
2016-11-26 23:16:06 -06:00 |
|
Adrian Malacoda
|
c04c030540
|
add user object
|
2016-11-26 23:14:09 -06:00 |
|
Adrian Malacoda
|
933e178ce5
|
initial commit for the-great-escape yuku scraper
|
2016-11-26 23:09:12 -06:00 |
|
Adrian Malacoda
|
e5fb7e5c9a
|
initial commit
|
2016-11-26 21:02:28 -06:00 |
|