Bubs

joined 7 months ago
MODERATOR OF
[–] Bubs@lemm.ee 37 points 1 day ago (1 children)
 
[–] Bubs@lemm.ee 5 points 3 days ago

"The anarchy could be in this very room!"

"It could be you! It could be me!"

"It could even be..."

*BLAM*

"WOAH!"

"What? It was obvious! He was the anarchy"

[–] Bubs@lemm.ee 11 points 4 days ago

Have I got news for you!

Bacon and Hobbes

A fan comic about Hobbes' daughter. The link includes all of the original fan-comic, plus some fan, fan-comics (not as great quality, but still there).

[–] Bubs@lemm.ee 2 points 5 days ago

Yeah, I'm really excited for three of the games, but can't justify the purchase at launch. I'll probably look into it for sometime next year.

(I'm waiting for the Valve Deckard as my next big purchase lol)

14
We Both Love It (files.catbox.moe)
 
[–] Bubs@lemm.ee 6 points 1 week ago

Looked it up, we're about 10,000 years too late

[–] Bubs@lemm.ee 2 points 1 week ago

After reading some of the other comments, I'm definitely going to separate the systems. I'll use something like json or yaml as the output for the raw scraped data, and some sort of database for the final program.

[–] Bubs@lemm.ee 1 points 1 week ago

That's an interesting read. I'll definitely give json a try too.

50
submitted 1 week ago* (last edited 1 week ago) by Bubs@lemm.ee to c/TwoGoobers@lemm.ee
 
[–] Bubs@lemm.ee 7 points 1 week ago

Glad I could brighten up your day!

[–] Bubs@lemm.ee 2 points 1 week ago

That's good to know.

[–] Bubs@lemm.ee 34 points 1 week ago (4 children)

WhAt'S a CoMpUtEr?

[–] Bubs@lemm.ee 1 points 1 week ago

Gonna be honest, I'll need to research a bit more what validating against a schema is, but I get the general idea, and I like it.

For initial testing and prototypes, I probably won't worry about validation, but once I get to the point of refining the system, validation like that would be a good idea.

[–] Bubs@lemm.ee 2 points 1 week ago (2 children)

One concern I'm seeing from other comments is that I may have more data than SQLite is ideal for. I have thousands of stories (My estimate is between 10 and 40 thousand), and many of the stories can be several pages long.

 

Short version of the situation is that I have an old site I frequent for user written stories. The site is ancient (think early 2000's), and has terrible tools for sorting and searching the stories. Half of the time, stories disappear from author profiles. Thousands of stories and you can only sort by top, new, and 30-day top.

I'm in the process of programming a scraper tool so I can archive the stories and give myself a library to better find forgotten stories on the site. I'll be storing tags, dates, authors, etc, as well as the full body of the text.

Concerning the data, there are a few thousand stories- ascii only, and various data points for each story with the body of many stores reaching several pages long.

Currently, I'm using Python to compile the data and would like to know what storage solution is ideal for my situation. I have a little familiarity with SQL, json, and yaml, but not enough to know what might be best. I am also open to any other solutions that work well with Python.

 
39
Everything I Need (files.catbox.moe)
 
 
 
74
Right Here with You (files.catbox.moe)
 
 
 
 
view more: next ›