mySQL from RAM Drive ? is it possible ?

BecomeWebHost · December 2017

these are tables... database tables.

wwabbit · December 2017

yokowasis said: I think I will go for varchar (10K). Is that better ?

json_encode the two arrays and store as TEXT, or even better use the json field support recently added to mysql or postgres

WSS · December 2017

You all do realize that he doesn't realize this is sarcasm, right?

trewq · December 2017

@yokowasis have you considered rewriting your reporting? Take a look at xapi for example. I work in the elearning industry, feel free to message me if you need a hand.

yokowasis · December 2017

@wwabbit said:

yokowasis said: I think I will go for varchar (10K). Is that better ?

json_encode the two arrays and store as TEXT, or even better use the json field support recently added to mysql or postgres

Is text better than varchar ?

@trewq said:
@yokowasis have you considered rewriting your reporting? Take a look at xapi for example. I work in the elearning industry, feel free to message me if you need a hand.

I am open to suggestion. You have a better way to save students answer ? It needs to be saved Everytime the students answer a question.

So saving the answer all at once after he finished is not a solution.

The reason is sometimes the internet connection is dropped or the laptop / PC crashed. The students need to restart the laptop or change the laptop. I need all of their answered still in there when they logged back in.

From this thread I have 2 solutions.

Convert all the answers fields into 1 field which consists of the array / json.
The downside of this method is that I need extra CPU power to parse, and manipulate the json / array, and an extra power to put a LONG string into database. I don't know how much power, because obviously I need some tests. And I don't know if it's actually perform better than what I have currently.
Divide the 100++ fields into 10 table with 10 fields each. Or 20 tables with 5 fields each. Because people in this thread seems to agree that a lot of fields is BAD. So let's have a lot of table with less fields. Is having a lot of table is equally BAD in MySQL ?
Instead of storing the answers on fields. Store it in a row. My concern in this method is the table will hit 1 million or 2 millions rows easily. When it happens, does the update will be better than it is right now ? Because MySQL need to find 1 record among MILLIONS of record to update.

Any advise on which route should I go ?

SQL fiddle is very appreciated.

Hybrid · December 2017

If hard drive is the bottleneck, then dividing the tables will be the fastest approach. The more tables the better since you're updating 1question at a time.

trewq · December 2017

@yokowasis Your best bet is to be recording actions instead of updating the rows in a relationship based DB. Take a look at XAPI and some learning record stores (LRS) for example Learning Locker.

Once you do that it's very easy to query your LRS for data matching specific verbs and actors (xapi terms).

This also will give you amazing reporting, for example being able to see when people select the wrong answer then change it to the correct one.

Using xapi (also called tincan api and experience api) is super powerful and if you're making a new learning platform you should definitely be using it.

yokowasis · December 2017

Thanks. I will be looking into this xapi. This is the first time I heard this term.

rincewind · December 2017

A separate row per (student, question) seems better than the other options. It saves the overhead of serializing back and forth from JSON or whatever. It's also less complex than having multiple tables. Just create a separate index for both student id and question id (CREATE INDEX student_index on student_table(student_id) and CREATE INDEX question_index on student_table (question_id)).

Use a write efficient storage-engine like RocksDB, instead of InnoDB. Refer http://myrocks.io/. LSM tree have lower write amplification than B+ trees.

Memory storage would still persist updates to the disk, so I don't see a big advantage. Batching updates in memory storage come at the risk of data loss. They work better for caches or infrequently updated data, I think.

teamacc · December 2017

It's not like I designed your database for you already...

@teamacc said:

@yokowasis said:

To summarize column 1 to 5 is used for students identity (id, username, name of the test, time starting the test) and column 6 to 105 is the students answer of each question. There are 100 questions.

Maybe there are better way to approach this ? I am open to suggestion.

Flatten your mysql tables.

Have 1 table with student name/number/etc. (primary key studentnumber)

Have 1 table with "first test", "second test" etc. (primary key testnumber)

Have 1 table with "first test question 1", "second test question 2" etc (primary key questionnumber, foreign key for testnumber, unique key combo "testnumber+questionnumber")

Have 1 table with "first test question 1 student 1 answer: A" (no real primary key (although some sources advise you to do so, might wanna look into that), but unique key combo "questionnumber+studentnumber" and foreign key answernumber)

Then, instead of having to update a 105 column row, you only insert/update a 2 column row. (INSERT INTO testanswers (questionnumber, answer) VALUES (124, 2) ON DUPLICATE KEY UPDATE answer=2)

ricardo · December 2017

DB normalisation is a skill that takes a week or two to understand and is totally worth it if you intend to keep working with databases.

Minimise redundancy, query on keys, avoid having too many indexes if your app is UPDATE/INSERT heavy.

WSS · December 2017

@teamacc said:
It's not like I designed your database for you already...

@teamacc said:

@yokowasis said:

To summarize column 1 to 5 is used for students identity (id, username, name of the test, time starting the test) and column 6 to 105 is the students answer of each question. There are 100 questions.

Maybe there are better way to approach this ? I am open to suggestion.

Flatten your mysql tables.

Have 1 table with student name/number/etc. (primary key studentnumber)

Have 1 table with "first test", "second test" etc. (primary key testnumber)

Have 1 table with "first test question 1", "second test question 2" etc (primary key questionnumber, foreign key for testnumber, unique key combo "testnumber+questionnumber")

Have 1 table with "first test question 1 student 1 answer: A" (no real primary key (although some sources advise you to do so, might wanna look into that), but unique key combo "questionnumber+studentnumber" and foreign key answernumber)

Then, instead of having to update a 105 column row, you only insert/update a 2 column row. (INSERT INTO testanswers (questionnumber, answer) VALUES (124, 2) ON DUPLICATE KEY UPDATE answer=2)

I like the studentnumber as the primary table number, but this can, and will fail. It's OK to match on it, but don't use it as the personnel primary key.

teamacc · December 2017

@WSS said:

@teamacc said:
It's not like I designed your database for you already...

@teamacc said:

@yokowasis said:

To summarize column 1 to 5 is used for students identity (id, username, name of the test, time starting the test) and column 6 to 105 is the students answer of each question. There are 100 questions.

Maybe there are better way to approach this ? I am open to suggestion.

Flatten your mysql tables.

Have 1 table with student name/number/etc. (primary key studentnumber)

Have 1 table with "first test", "second test" etc. (primary key testnumber)

Have 1 table with "first test question 1", "second test question 2" etc (primary key questionnumber, foreign key for testnumber, unique key combo "testnumber+questionnumber")

Have 1 table with "first test question 1 student 1 answer: A" (no real primary key (although some sources advise you to do so, might wanna look into that), but unique key combo "questionnumber+studentnumber" and foreign key answernumber)

Then, instead of having to update a 105 column row, you only insert/update a 2 column row. (INSERT INTO testanswers (questionnumber, answer) VALUES (124, 2) ON DUPLICATE KEY UPDATE answer=2)

I like the studentnumber as the primary table number, but this can, and will fail. It's OK to match on it, but don't use it as the personnel primary key.

That's fair and probably correct. I wrote the above design in about 5 minutes time and it would already blow OP's current setup out of the water big time.

yokowasis · December 2017

@teamacc yes you did. Your solution is on number 3. My concern , the rows will hit millions record in no time.

20k students with 100 answers each will result in 2 millions row. And it terrified me. Never get past 100k record before. You sure the update performance won't suffer because it needs to find a record to update among millions of record ?

WSS · December 2017

@yokowasis said:
@teamacc yes you did. Your solution is on number 3. My concern , the rows will hit millions record in no time.

20k students with 100 answers each will result in 2 millions row. And it terrified me. Never get past 100k record before. You sure the update performance won't suffer because it needs to find a record to update among millions of record ?

To be as polite as possible, you really don't understand how databases work. You need to spend some time on that. TeamACC's solution is pretty good, but I disagree with the student number key as there are always fuckups in academia, and god forbid you don't force a unique key, and it can't be numeric, because there will eventually be a 204304a, because of a clerical error, and someone will try to do an edit as an insert and it failing with an insert clause on a unique, etc, etc, etc..

Plus, it'll make it easier to find cheaters.

yokowasis · December 2017

@WSS said:

@yokowasis said:
@teamacc yes you did. Your solution is on number 3. My concern , the rows will hit millions record in no time.

20k students with 100 answers each will result in 2 millions row. And it terrified me. Never get past 100k record before. You sure the update performance won't suffer because it needs to find a record to update among millions of record ?

To be as polite as possible, you really don't understand how databases work. You need to spend some time on that. TeamACC's solution is pretty good, but I disagree with the student number key as there are always fuckups in academia, and god forbid you don't force a unique key, and it can't be numeric, because there will eventually be a 204304a, because of a clerical error, and someone will try to do an edit as an insert and it failing with an insert clause on a unique, etc, etc, etc..

Plus, it'll make it easier to find cheaters.

Yes, that's because whenever I find anything regarding , e.g. big column vs big row , the answers is always "depends" , that's why I am so fucking confused on what route to go.

So , tldr, I should go with the few columns , millions row route ?

WSS · December 2017

You don't really need to do that.

Just make your design extendable.

ID primary key of test matches testID, secondary key is userID, answers can be setup as a subtable if you really want to do that, or just store them as blobs. Each test/section/etc gets it's own entry in the column.

Here's the smart bit- you don't need to make that static or keep it there forever, you can always move it to an offline status, or only allow searching for sub testIDs within timeframe (stored above), et al..

You really need to diagram out what you want to achieve, and the schema will follow.

teamacc · December 2017

@yokowasis said:
@teamacc yes you did. Your solution is on number 3. My concern , the rows will hit millions record in no time.

20k students with 100 answers each will result in 2 millions row. And it terrified me. Never get past 100k record before. You sure the update performance won't suffer because it needs to find a record to update among millions of record ?

Currently running an online gaming community where I have a database of wel over 8M rows in one table, and 36M+ in another.

Responses of the database are still under 50ms each, no matter what I throw at it.

If properly designed, rowcount does not matter.

agentmishra · December 2017

why not try the memory engine in mysql...

WSS · December 2017

@agentmishra said:
why not try the memory engine in mysql...

Thanks for reading the thread.

elgs · December 2017

My instinct is to mount /var/lib/mysql/your_db to a tmpfs fs, and rsync the tmpfs to another place periodically given that: 1, you have enough RAM for your data which you probably will have, since that's the whole point you posted this thread, 2, you will be fine to lose data back until the last rsync.

ricardo · December 2017

teamacc said: If properly designed, rowcount does not matter.

TBF, insert order and keys has a lot to do with that.

Harzem · December 2017

yokowasis said: I think I will go for varchar (10K). Is that better ?

No, use TEXT.

yokowasis said: Convert all the answers fields into 1 field which consists of the array / json. The downside of this method is that I need extra CPU power to parse, and manipulate the json / array, and an extra power to put a LONG string into database. I don't know how much power, because obviously I need some tests. And I don't know if it's actually perform better than what I have currently.

It will use ZERO cpu. Serializing or unserializing a 100 item array where each item is a character takes a microsecond.

Don't confuse yourself any further. Use a single TEXT column, serialize/unserialize at each update, and enjoy 1% cpu load.

Howdy, Stranger!

Categories

In this Discussion

mySQL from RAM Drive ? is it possible ?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

mySQL from RAM Drive ? is it possible ?

Comments