COUNT(*) or MAX(id) - which is faster? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experience Should we burninate the [wrap] tag?How to efficiently count the number of keys/properties of an object in JavaScript?Which “href” value should I use for JavaScript links, “#” or “javascript:void(0)”?Which is faster: Stack allocation or Heap allocationSQL select only rows with max value on a columnWhy are elementwise additions much faster in separate loops than in a combined loop?Why is it faster to process a sorted array than an unsorted array?Why does Python code run faster in a function?Is < faster than <=?Which is faster: while(1) or while(2)?Why is [] faster than list()?

Should gear shift center itself while in neutral?

What causes the vertical darker bands in my photo?

What are the pros and cons of Aerospike nosecones?

How to bypass password on Windows XP account?

WAN encapsulation

What's the purpose of writing one's academic bio in 3rd person?

Are variable time comparisons always a security risk in cryptography code?

Sorting numerically

Does polymorph use a PC’s CR or its level?

Withdrew £2800, but only £2000 shows as withdrawn on online banking; what are my obligations?

How can I fade player when goes inside or outside of the area?

Why is there no army of Iron-Mans in the MCU?

Do you forfeit tax refunds/credits if you aren't required to and don't file by April 15?

Does the Giant Rocktopus have a Swim Speed?

Is there a documented rationale why the House Ways and Means chairman can demand tax info?

Should I call the interviewer directly, if HR aren't responding?

Is there a Spanish version of "dot your i's and cross your t's" that includes the letter 'ñ'?

Marking the functions of a sentence: 'She may like it'

What does '1 unit of lemon juice' mean in a grandma's drink recipe?

Is there a way in Ruby to make just any one out of many keyword arguments required?

How much radiation do nuclear physics experiments expose researchers to nowadays?

macOS-like app switching in Plasma 5

How do I keep my slimes from escaping their pens?

Is high blood pressure ever a symptom attributable solely to dehydration?



COUNT(*) or MAX(id) - which is faster?



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
The Ask Question Wizard is Live!
Data science time! April 2019 and salary with experience
Should we burninate the [wrap] tag?How to efficiently count the number of keys/properties of an object in JavaScript?Which “href” value should I use for JavaScript links, “#” or “javascript:void(0)”?Which is faster: Stack allocation or Heap allocationSQL select only rows with max value on a columnWhy are elementwise additions much faster in separate loops than in a combined loop?Why is it faster to process a sorted array than an unsorted array?Why does Python code run faster in a function?Is < faster than <=?Which is faster: while(1) or while(2)?Why is [] faster than list()?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








14















I have a web server on which I've implemented my own messaging system.
I am at a phase where I need to create an API that checks if the user has new messages.



My DB table is simple:



ID - Auto Increment, Primary Key (Bigint)
Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table
Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table
Message - Varchar (256) //UTF8 BIN


I am considering making an API that will estimate if there are new messages for a given user. I am thinking of using one of these methods:



A) Select count(*) of messages where sender or recipient is me.

(if this number > previous number, I have a new message)



B) Select max(ID) of messages where sender or recipient is me.

(if max(ID) > than previous number, I have a new message)



My question is: Can I calculate somehow what method will consume fewer server resources? Or is there some article? Maybe another method I didn't mention?










share|improve this question



















  • 3





    I think you would be better off by adding a timestamp column and checking against that value to see if there are newer records.

    – Dharman
    Apr 8 at 20:17











  • Either querying a timestamp or the ID, use MAX() on that column, and make sure it's indexed with (user_id, timestamp).

    – The Impaler
    Apr 8 at 20:19











  • @Dharman i was thinking of it. But it costs extra DB space, also i am not sure if it will be faster than one of my methods. I am storing the simple number (of current messages) in usernames table

    – FeHora
    Apr 8 at 20:19






  • 1





    Calculate? No idea. But you can measure it. Fire off a few thousands of each query and watch machine metrics (cpu%, mem%, load average, etc.)

    – Sergio Tulentsev
    Apr 8 at 20:20






  • 2





    While there is a good answer to this question below, I suspect you might be optimizing on something that turns out not to be important. And unless you anticipate having literally millions of messages, I wouldn't worry about disk space, especially because the timestamp is small compared to your other fields. If you add timestamps, your table will be about 5MB larger for each million messages. That's really nothing.

    – Jerry
    Apr 8 at 20:57

















14















I have a web server on which I've implemented my own messaging system.
I am at a phase where I need to create an API that checks if the user has new messages.



My DB table is simple:



ID - Auto Increment, Primary Key (Bigint)
Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table
Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table
Message - Varchar (256) //UTF8 BIN


I am considering making an API that will estimate if there are new messages for a given user. I am thinking of using one of these methods:



A) Select count(*) of messages where sender or recipient is me.

(if this number > previous number, I have a new message)



B) Select max(ID) of messages where sender or recipient is me.

(if max(ID) > than previous number, I have a new message)



My question is: Can I calculate somehow what method will consume fewer server resources? Or is there some article? Maybe another method I didn't mention?










share|improve this question



















  • 3





    I think you would be better off by adding a timestamp column and checking against that value to see if there are newer records.

    – Dharman
    Apr 8 at 20:17











  • Either querying a timestamp or the ID, use MAX() on that column, and make sure it's indexed with (user_id, timestamp).

    – The Impaler
    Apr 8 at 20:19











  • @Dharman i was thinking of it. But it costs extra DB space, also i am not sure if it will be faster than one of my methods. I am storing the simple number (of current messages) in usernames table

    – FeHora
    Apr 8 at 20:19






  • 1





    Calculate? No idea. But you can measure it. Fire off a few thousands of each query and watch machine metrics (cpu%, mem%, load average, etc.)

    – Sergio Tulentsev
    Apr 8 at 20:20






  • 2





    While there is a good answer to this question below, I suspect you might be optimizing on something that turns out not to be important. And unless you anticipate having literally millions of messages, I wouldn't worry about disk space, especially because the timestamp is small compared to your other fields. If you add timestamps, your table will be about 5MB larger for each million messages. That's really nothing.

    – Jerry
    Apr 8 at 20:57













14












14








14


2






I have a web server on which I've implemented my own messaging system.
I am at a phase where I need to create an API that checks if the user has new messages.



My DB table is simple:



ID - Auto Increment, Primary Key (Bigint)
Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table
Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table
Message - Varchar (256) //UTF8 BIN


I am considering making an API that will estimate if there are new messages for a given user. I am thinking of using one of these methods:



A) Select count(*) of messages where sender or recipient is me.

(if this number > previous number, I have a new message)



B) Select max(ID) of messages where sender or recipient is me.

(if max(ID) > than previous number, I have a new message)



My question is: Can I calculate somehow what method will consume fewer server resources? Or is there some article? Maybe another method I didn't mention?










share|improve this question
















I have a web server on which I've implemented my own messaging system.
I am at a phase where I need to create an API that checks if the user has new messages.



My DB table is simple:



ID - Auto Increment, Primary Key (Bigint)
Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table
Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table
Message - Varchar (256) //UTF8 BIN


I am considering making an API that will estimate if there are new messages for a given user. I am thinking of using one of these methods:



A) Select count(*) of messages where sender or recipient is me.

(if this number > previous number, I have a new message)



B) Select max(ID) of messages where sender or recipient is me.

(if max(ID) > than previous number, I have a new message)



My question is: Can I calculate somehow what method will consume fewer server resources? Or is there some article? Maybe another method I didn't mention?







php mysql performance






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 9 at 15:35









Boann

37.5k1290122




37.5k1290122










asked Apr 8 at 20:15









FeHoraFeHora

788




788







  • 3





    I think you would be better off by adding a timestamp column and checking against that value to see if there are newer records.

    – Dharman
    Apr 8 at 20:17











  • Either querying a timestamp or the ID, use MAX() on that column, and make sure it's indexed with (user_id, timestamp).

    – The Impaler
    Apr 8 at 20:19











  • @Dharman i was thinking of it. But it costs extra DB space, also i am not sure if it will be faster than one of my methods. I am storing the simple number (of current messages) in usernames table

    – FeHora
    Apr 8 at 20:19






  • 1





    Calculate? No idea. But you can measure it. Fire off a few thousands of each query and watch machine metrics (cpu%, mem%, load average, etc.)

    – Sergio Tulentsev
    Apr 8 at 20:20






  • 2





    While there is a good answer to this question below, I suspect you might be optimizing on something that turns out not to be important. And unless you anticipate having literally millions of messages, I wouldn't worry about disk space, especially because the timestamp is small compared to your other fields. If you add timestamps, your table will be about 5MB larger for each million messages. That's really nothing.

    – Jerry
    Apr 8 at 20:57












  • 3





    I think you would be better off by adding a timestamp column and checking against that value to see if there are newer records.

    – Dharman
    Apr 8 at 20:17











  • Either querying a timestamp or the ID, use MAX() on that column, and make sure it's indexed with (user_id, timestamp).

    – The Impaler
    Apr 8 at 20:19











  • @Dharman i was thinking of it. But it costs extra DB space, also i am not sure if it will be faster than one of my methods. I am storing the simple number (of current messages) in usernames table

    – FeHora
    Apr 8 at 20:19






  • 1





    Calculate? No idea. But you can measure it. Fire off a few thousands of each query and watch machine metrics (cpu%, mem%, load average, etc.)

    – Sergio Tulentsev
    Apr 8 at 20:20






  • 2





    While there is a good answer to this question below, I suspect you might be optimizing on something that turns out not to be important. And unless you anticipate having literally millions of messages, I wouldn't worry about disk space, especially because the timestamp is small compared to your other fields. If you add timestamps, your table will be about 5MB larger for each million messages. That's really nothing.

    – Jerry
    Apr 8 at 20:57







3




3





I think you would be better off by adding a timestamp column and checking against that value to see if there are newer records.

– Dharman
Apr 8 at 20:17





I think you would be better off by adding a timestamp column and checking against that value to see if there are newer records.

– Dharman
Apr 8 at 20:17













Either querying a timestamp or the ID, use MAX() on that column, and make sure it's indexed with (user_id, timestamp).

– The Impaler
Apr 8 at 20:19





Either querying a timestamp or the ID, use MAX() on that column, and make sure it's indexed with (user_id, timestamp).

– The Impaler
Apr 8 at 20:19













@Dharman i was thinking of it. But it costs extra DB space, also i am not sure if it will be faster than one of my methods. I am storing the simple number (of current messages) in usernames table

– FeHora
Apr 8 at 20:19





@Dharman i was thinking of it. But it costs extra DB space, also i am not sure if it will be faster than one of my methods. I am storing the simple number (of current messages) in usernames table

– FeHora
Apr 8 at 20:19




1




1





Calculate? No idea. But you can measure it. Fire off a few thousands of each query and watch machine metrics (cpu%, mem%, load average, etc.)

– Sergio Tulentsev
Apr 8 at 20:20





Calculate? No idea. But you can measure it. Fire off a few thousands of each query and watch machine metrics (cpu%, mem%, load average, etc.)

– Sergio Tulentsev
Apr 8 at 20:20




2




2





While there is a good answer to this question below, I suspect you might be optimizing on something that turns out not to be important. And unless you anticipate having literally millions of messages, I wouldn't worry about disk space, especially because the timestamp is small compared to your other fields. If you add timestamps, your table will be about 5MB larger for each million messages. That's really nothing.

– Jerry
Apr 8 at 20:57





While there is a good answer to this question below, I suspect you might be optimizing on something that turns out not to be important. And unless you anticipate having literally millions of messages, I wouldn't worry about disk space, especially because the timestamp is small compared to your other fields. If you add timestamps, your table will be about 5MB larger for each million messages. That's really nothing.

– Jerry
Apr 8 at 20:57












4 Answers
4






active

oldest

votes


















16














In MySQL InnoDB, SELECT COUNT(*) WHERE secondary_index = ? is an expensive operation and when the user has a lot of messages, this query might take a long time. Even when using an index, the engine still needs to count all matching records. The performance will degrade with growing total message count.



On the other hand, SELECT MAX(id) WHERE secondary_index = ? can deliver the highest id in that index very efficiently by doing a so-called loose index scan. The performance will stay almost constant.



If you want to understand why, consider looking up the B+Tree data structure which InnoDB uses to organise its data.



I suggest you go with SELECT MAX(id), if the requirement is only to check if there are new messages (and not the count of them).



Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?






share|improve this answer




















  • 1





    "SELECT MAX(id) will always use the primary index" - yeah, except for the cases when there's a where on an unindexed field.

    – Sergio Tulentsev
    Apr 8 at 20:21











  • @SergioTulentsev i forgot to mention in my main post, sender and recipient are foreign keys to user-hash (ID) - primary key in users table. So it will be indexed always.

    – FeHora
    Apr 8 at 20:22












  • @Kaii "Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?" if the user deletes the message it just become hidden for security reasons, it will have a value hidden:true. but the count will not change

    – FeHora
    Apr 8 at 20:31






  • 5





    If there's an index on a, then SELECT MAX(id) FROM tbl WHERE a=constant uses a so-called loose index scan. Those are almost miraculously fast. SELECT COUNT(*) FROM tbl WHERE a=constant does a tight index scan, which is not as fast.

    – O. Jones
    Apr 8 at 20:43







  • 1





    @FeHora i strongly suggest to setup some sort of test environment, a database with generated records for you to play with.

    – Kaii
    Apr 8 at 20:56



















3














To have the information that someone has new messages - do exactly that. Update the field in users table (I'm assuming that's the name) when a new message is recorded in the system. You have the recipient's ID, that's all you need. You can create an after insert trigger (assumption: there's users2messages table) that updates users table with a boolean flag indicating there's a message.



This approach is by far faster than counting indexes, be the index primary or secondary. When the user performs an action, you can update the users table with has_messages = 0, when a new message arrives - you update the table with has_messages = 1. It's simple, it works, it scales and using triggers to maintain it makes it easy and seamless.
I'm sure there will be nay-sayers who don't like triggers, you can do it manually at the point of associating a user with a new message.






share|improve this answer























  • triggers aside, looking up a row using the PK and also reading it to check the boolean is still more expensive than executing a single loose index scan. It gets worse when you also add a WHERE clause to check the boolean flag because of the low cardinality even if you index that field. Sorry to tell you you that, but you have a misunderstanding there.

    – Kaii
    Apr 8 at 21:01












  • @Mjh i know about that.. but it's definitely more expensive than my suggested methods, because it contains (at least) 1x update + 1x select

    – FeHora
    Apr 8 at 21:23






  • 3





    @Kaii SELECT has_messages FROM users WHERE id = 1; is the fastest query there is. It's an eq_ref which is infinitely faster than counting a number of records in the table. The boolean field is not in the WHERE clause, the primary key is. Please, assume better next time. In regards to updating the table: the update is fast as well, it handles a single row located using the primary key. If the field is already containing the value that you're updating to, no actual disk I/O occurs and there's a minimal performance penalty. Much less than counting the records. You can measure.

    – Mjh
    Apr 8 at 21:36












  • Well ... Taking into account that we're talking about [1 trigger including 1 lookup and 1 update to set the flag + 1 lookup and 1 update to unset the flag] vs [1 loose index scan], i think it's obvious what's more overhead. But sure, you can measure. ;-) You are right that eq_ref is the fastest kind of lookup, but doing it four times including two updates just doesn't compare to a single, very simple operation.

    – Kaii
    Apr 12 at 22:31












  • @Kaii "loose" index scan means you have to go through the dataset (which can be in RAM but doesn't have to be) every time you want the data, or you perform a simple lookup and a simple operation that takes less CPU time and incurs less I/O wait. Bottom line being that you obtain the data faster every time you need it (are there messages or not) opposed to counting every single record every time you want a yes/no. For some reason, you can't seem to grasp that simple optimization step. I can't explain it easier, I doubt you're even reading.

    – Mjh
    2 days ago



















-2














If you need to know the number of new messages then using
Select count(*) from Messages where user_id in (sender, recipient) and id > last_seen_id would be your best option.



I'm a fan of using exists where possible, so to determine IF there are new messages, my query would be Select exists(Select 1 from Messages where user_id in (sender, recipient) and id > last_seen_id). The benefit of exists is that as soon as it finds 1 record it returns true.



Edit: To avoid any confusion in reading this answer, both of those queries would also include a check for other_user_id in (sender, recipient) in order to only return the messages between 2 specific users.






share|improve this answer
































    -2














    @FeHora You talk about not using keys to save db space. The table designs wastes more db space.



    ID - Auto Increment, Primary Key (Bigint)


    Is bigint really necessary? Let us assume, the a message is send every second. The a int unsigned is enough for 126 years. And if you have really so much messages, a key is mandatory.



    Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table
    Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table


    Why not using the UserID (usually an int unsigned).



    Then I would add a seen flags. Btw, you can add for all filed the attribute not null.



    seen tinyint not NULL.


    Last not least I recomment the variant of @Mjh : Define a flag has_messages, or new_messages, or both in the user record. Usually, the user record is loaded so it is NOT an additional database query.






    share|improve this answer


















    • 1





      This messaging system is for a government-ish organization, 90% of messages are sent to users from systems (like temperature in room is above 30C ..etc etc).. It can generate millions of messages per hour, that's why i need to optimize every microsecond of server time. I cannot use UserID key because of reverse engineering + GDPR (EU thing). Long story short - i need to have everything encrypted and fast. every additional data field can cause a lot of extra unwanted database storage space.

      – FeHora
      Apr 9 at 6:16











    • @FeHora if what you wrote is true, then the accepted answer is exactly what you want to avoid. Million records per hour is only 278 inserts per second. Even old mechanical drivers were able to pull of ~400 IOPS, current SSDs are starting at 5k IOPS and getting 250k IOPS drive is not expensive any more. If it's a government asset, I take it you won't run it on a Raspberry Pi but a server with sufficient RAM and storage (128GB of RAM, a few TB of SSD). That just means that your microoptimizations aren't worth it. However, suggesting a varchar(32) key for a foreign key is.. just bad.

      – Mjh
      Apr 9 at 11:14












    • why @Mjh ? the change (if have new mail) is written only once (in end-user android app cookies), so it's not torturing the database system/performance. Only one select until the user opening new messages tab. The app has mobile notifications and works only in local area (intranet app). So the accepted answer is exactly that costs minimal server resources. Now i have 800+ users logged in and DB server/web server load is ~2% . I am using failover , so the data must be shipped to backup server also, in real time. 2% is really not too much.

      – FeHora
      Apr 9 at 11:21











    • @FeHora because it's not the fastest solution. Your load will remain low, but the conclusion that the chosen method if the best because server load is low is false conclusion. Currently, you're unaware if you're I/O or CPU bound (you'd be I/O bound, 99.9% of DB operations are I/O bound operations). Designing your database while avoiding foreign key constraints is awful and proof that premature optimization is the root of all evil. You chose to have inconsistent data under pretense of performance. You never measured what your server can do and where it shows signs of slowing down.

      – Mjh
      Apr 9 at 11:26












    • It's clear that you're conscious about what you're doing, but you went about it entirely wrong. Neither will one server be sufficient, nor can you choose to leave features out and ignore consistency because you think it contributes to downgraded performance. Even now, when your thing is running - you saw that your load is abysmal.

      – Mjh
      Apr 9 at 11:27











    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55581114%2fcount-or-maxid-which-is-faster%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    16














    In MySQL InnoDB, SELECT COUNT(*) WHERE secondary_index = ? is an expensive operation and when the user has a lot of messages, this query might take a long time. Even when using an index, the engine still needs to count all matching records. The performance will degrade with growing total message count.



    On the other hand, SELECT MAX(id) WHERE secondary_index = ? can deliver the highest id in that index very efficiently by doing a so-called loose index scan. The performance will stay almost constant.



    If you want to understand why, consider looking up the B+Tree data structure which InnoDB uses to organise its data.



    I suggest you go with SELECT MAX(id), if the requirement is only to check if there are new messages (and not the count of them).



    Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?






    share|improve this answer




















    • 1





      "SELECT MAX(id) will always use the primary index" - yeah, except for the cases when there's a where on an unindexed field.

      – Sergio Tulentsev
      Apr 8 at 20:21











    • @SergioTulentsev i forgot to mention in my main post, sender and recipient are foreign keys to user-hash (ID) - primary key in users table. So it will be indexed always.

      – FeHora
      Apr 8 at 20:22












    • @Kaii "Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?" if the user deletes the message it just become hidden for security reasons, it will have a value hidden:true. but the count will not change

      – FeHora
      Apr 8 at 20:31






    • 5





      If there's an index on a, then SELECT MAX(id) FROM tbl WHERE a=constant uses a so-called loose index scan. Those are almost miraculously fast. SELECT COUNT(*) FROM tbl WHERE a=constant does a tight index scan, which is not as fast.

      – O. Jones
      Apr 8 at 20:43







    • 1





      @FeHora i strongly suggest to setup some sort of test environment, a database with generated records for you to play with.

      – Kaii
      Apr 8 at 20:56
















    16














    In MySQL InnoDB, SELECT COUNT(*) WHERE secondary_index = ? is an expensive operation and when the user has a lot of messages, this query might take a long time. Even when using an index, the engine still needs to count all matching records. The performance will degrade with growing total message count.



    On the other hand, SELECT MAX(id) WHERE secondary_index = ? can deliver the highest id in that index very efficiently by doing a so-called loose index scan. The performance will stay almost constant.



    If you want to understand why, consider looking up the B+Tree data structure which InnoDB uses to organise its data.



    I suggest you go with SELECT MAX(id), if the requirement is only to check if there are new messages (and not the count of them).



    Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?






    share|improve this answer




















    • 1





      "SELECT MAX(id) will always use the primary index" - yeah, except for the cases when there's a where on an unindexed field.

      – Sergio Tulentsev
      Apr 8 at 20:21











    • @SergioTulentsev i forgot to mention in my main post, sender and recipient are foreign keys to user-hash (ID) - primary key in users table. So it will be indexed always.

      – FeHora
      Apr 8 at 20:22












    • @Kaii "Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?" if the user deletes the message it just become hidden for security reasons, it will have a value hidden:true. but the count will not change

      – FeHora
      Apr 8 at 20:31






    • 5





      If there's an index on a, then SELECT MAX(id) FROM tbl WHERE a=constant uses a so-called loose index scan. Those are almost miraculously fast. SELECT COUNT(*) FROM tbl WHERE a=constant does a tight index scan, which is not as fast.

      – O. Jones
      Apr 8 at 20:43







    • 1





      @FeHora i strongly suggest to setup some sort of test environment, a database with generated records for you to play with.

      – Kaii
      Apr 8 at 20:56














    16












    16








    16







    In MySQL InnoDB, SELECT COUNT(*) WHERE secondary_index = ? is an expensive operation and when the user has a lot of messages, this query might take a long time. Even when using an index, the engine still needs to count all matching records. The performance will degrade with growing total message count.



    On the other hand, SELECT MAX(id) WHERE secondary_index = ? can deliver the highest id in that index very efficiently by doing a so-called loose index scan. The performance will stay almost constant.



    If you want to understand why, consider looking up the B+Tree data structure which InnoDB uses to organise its data.



    I suggest you go with SELECT MAX(id), if the requirement is only to check if there are new messages (and not the count of them).



    Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?






    share|improve this answer















    In MySQL InnoDB, SELECT COUNT(*) WHERE secondary_index = ? is an expensive operation and when the user has a lot of messages, this query might take a long time. Even when using an index, the engine still needs to count all matching records. The performance will degrade with growing total message count.



    On the other hand, SELECT MAX(id) WHERE secondary_index = ? can deliver the highest id in that index very efficiently by doing a so-called loose index scan. The performance will stay almost constant.



    If you want to understand why, consider looking up the B+Tree data structure which InnoDB uses to organise its data.



    I suggest you go with SELECT MAX(id), if the requirement is only to check if there are new messages (and not the count of them).



    Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited yesterday

























    answered Apr 8 at 20:19









    KaiiKaii

    15.8k22951




    15.8k22951







    • 1





      "SELECT MAX(id) will always use the primary index" - yeah, except for the cases when there's a where on an unindexed field.

      – Sergio Tulentsev
      Apr 8 at 20:21











    • @SergioTulentsev i forgot to mention in my main post, sender and recipient are foreign keys to user-hash (ID) - primary key in users table. So it will be indexed always.

      – FeHora
      Apr 8 at 20:22












    • @Kaii "Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?" if the user deletes the message it just become hidden for security reasons, it will have a value hidden:true. but the count will not change

      – FeHora
      Apr 8 at 20:31






    • 5





      If there's an index on a, then SELECT MAX(id) FROM tbl WHERE a=constant uses a so-called loose index scan. Those are almost miraculously fast. SELECT COUNT(*) FROM tbl WHERE a=constant does a tight index scan, which is not as fast.

      – O. Jones
      Apr 8 at 20:43







    • 1





      @FeHora i strongly suggest to setup some sort of test environment, a database with generated records for you to play with.

      – Kaii
      Apr 8 at 20:56













    • 1





      "SELECT MAX(id) will always use the primary index" - yeah, except for the cases when there's a where on an unindexed field.

      – Sergio Tulentsev
      Apr 8 at 20:21











    • @SergioTulentsev i forgot to mention in my main post, sender and recipient are foreign keys to user-hash (ID) - primary key in users table. So it will be indexed always.

      – FeHora
      Apr 8 at 20:22












    • @Kaii "Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?" if the user deletes the message it just become hidden for security reasons, it will have a value hidden:true. but the count will not change

      – FeHora
      Apr 8 at 20:31






    • 5





      If there's an index on a, then SELECT MAX(id) FROM tbl WHERE a=constant uses a so-called loose index scan. Those are almost miraculously fast. SELECT COUNT(*) FROM tbl WHERE a=constant does a tight index scan, which is not as fast.

      – O. Jones
      Apr 8 at 20:43







    • 1





      @FeHora i strongly suggest to setup some sort of test environment, a database with generated records for you to play with.

      – Kaii
      Apr 8 at 20:56








    1




    1





    "SELECT MAX(id) will always use the primary index" - yeah, except for the cases when there's a where on an unindexed field.

    – Sergio Tulentsev
    Apr 8 at 20:21





    "SELECT MAX(id) will always use the primary index" - yeah, except for the cases when there's a where on an unindexed field.

    – Sergio Tulentsev
    Apr 8 at 20:21













    @SergioTulentsev i forgot to mention in my main post, sender and recipient are foreign keys to user-hash (ID) - primary key in users table. So it will be indexed always.

    – FeHora
    Apr 8 at 20:22






    @SergioTulentsev i forgot to mention in my main post, sender and recipient are foreign keys to user-hash (ID) - primary key in users table. So it will be indexed always.

    – FeHora
    Apr 8 at 20:22














    @Kaii "Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?" if the user deletes the message it just become hidden for security reasons, it will have a value hidden:true. but the count will not change

    – FeHora
    Apr 8 at 20:31





    @Kaii "Also, if you rely on the message count you might open a gap for race conditions. What if the user deletes a message and receives a new one between two polling intervals?" if the user deletes the message it just become hidden for security reasons, it will have a value hidden:true. but the count will not change

    – FeHora
    Apr 8 at 20:31




    5




    5





    If there's an index on a, then SELECT MAX(id) FROM tbl WHERE a=constant uses a so-called loose index scan. Those are almost miraculously fast. SELECT COUNT(*) FROM tbl WHERE a=constant does a tight index scan, which is not as fast.

    – O. Jones
    Apr 8 at 20:43






    If there's an index on a, then SELECT MAX(id) FROM tbl WHERE a=constant uses a so-called loose index scan. Those are almost miraculously fast. SELECT COUNT(*) FROM tbl WHERE a=constant does a tight index scan, which is not as fast.

    – O. Jones
    Apr 8 at 20:43





    1




    1





    @FeHora i strongly suggest to setup some sort of test environment, a database with generated records for you to play with.

    – Kaii
    Apr 8 at 20:56






    @FeHora i strongly suggest to setup some sort of test environment, a database with generated records for you to play with.

    – Kaii
    Apr 8 at 20:56














    3














    To have the information that someone has new messages - do exactly that. Update the field in users table (I'm assuming that's the name) when a new message is recorded in the system. You have the recipient's ID, that's all you need. You can create an after insert trigger (assumption: there's users2messages table) that updates users table with a boolean flag indicating there's a message.



    This approach is by far faster than counting indexes, be the index primary or secondary. When the user performs an action, you can update the users table with has_messages = 0, when a new message arrives - you update the table with has_messages = 1. It's simple, it works, it scales and using triggers to maintain it makes it easy and seamless.
    I'm sure there will be nay-sayers who don't like triggers, you can do it manually at the point of associating a user with a new message.






    share|improve this answer























    • triggers aside, looking up a row using the PK and also reading it to check the boolean is still more expensive than executing a single loose index scan. It gets worse when you also add a WHERE clause to check the boolean flag because of the low cardinality even if you index that field. Sorry to tell you you that, but you have a misunderstanding there.

      – Kaii
      Apr 8 at 21:01












    • @Mjh i know about that.. but it's definitely more expensive than my suggested methods, because it contains (at least) 1x update + 1x select

      – FeHora
      Apr 8 at 21:23






    • 3





      @Kaii SELECT has_messages FROM users WHERE id = 1; is the fastest query there is. It's an eq_ref which is infinitely faster than counting a number of records in the table. The boolean field is not in the WHERE clause, the primary key is. Please, assume better next time. In regards to updating the table: the update is fast as well, it handles a single row located using the primary key. If the field is already containing the value that you're updating to, no actual disk I/O occurs and there's a minimal performance penalty. Much less than counting the records. You can measure.

      – Mjh
      Apr 8 at 21:36












    • Well ... Taking into account that we're talking about [1 trigger including 1 lookup and 1 update to set the flag + 1 lookup and 1 update to unset the flag] vs [1 loose index scan], i think it's obvious what's more overhead. But sure, you can measure. ;-) You are right that eq_ref is the fastest kind of lookup, but doing it four times including two updates just doesn't compare to a single, very simple operation.

      – Kaii
      Apr 12 at 22:31












    • @Kaii "loose" index scan means you have to go through the dataset (which can be in RAM but doesn't have to be) every time you want the data, or you perform a simple lookup and a simple operation that takes less CPU time and incurs less I/O wait. Bottom line being that you obtain the data faster every time you need it (are there messages or not) opposed to counting every single record every time you want a yes/no. For some reason, you can't seem to grasp that simple optimization step. I can't explain it easier, I doubt you're even reading.

      – Mjh
      2 days ago
















    3














    To have the information that someone has new messages - do exactly that. Update the field in users table (I'm assuming that's the name) when a new message is recorded in the system. You have the recipient's ID, that's all you need. You can create an after insert trigger (assumption: there's users2messages table) that updates users table with a boolean flag indicating there's a message.



    This approach is by far faster than counting indexes, be the index primary or secondary. When the user performs an action, you can update the users table with has_messages = 0, when a new message arrives - you update the table with has_messages = 1. It's simple, it works, it scales and using triggers to maintain it makes it easy and seamless.
    I'm sure there will be nay-sayers who don't like triggers, you can do it manually at the point of associating a user with a new message.






    share|improve this answer























    • triggers aside, looking up a row using the PK and also reading it to check the boolean is still more expensive than executing a single loose index scan. It gets worse when you also add a WHERE clause to check the boolean flag because of the low cardinality even if you index that field. Sorry to tell you you that, but you have a misunderstanding there.

      – Kaii
      Apr 8 at 21:01












    • @Mjh i know about that.. but it's definitely more expensive than my suggested methods, because it contains (at least) 1x update + 1x select

      – FeHora
      Apr 8 at 21:23






    • 3





      @Kaii SELECT has_messages FROM users WHERE id = 1; is the fastest query there is. It's an eq_ref which is infinitely faster than counting a number of records in the table. The boolean field is not in the WHERE clause, the primary key is. Please, assume better next time. In regards to updating the table: the update is fast as well, it handles a single row located using the primary key. If the field is already containing the value that you're updating to, no actual disk I/O occurs and there's a minimal performance penalty. Much less than counting the records. You can measure.

      – Mjh
      Apr 8 at 21:36












    • Well ... Taking into account that we're talking about [1 trigger including 1 lookup and 1 update to set the flag + 1 lookup and 1 update to unset the flag] vs [1 loose index scan], i think it's obvious what's more overhead. But sure, you can measure. ;-) You are right that eq_ref is the fastest kind of lookup, but doing it four times including two updates just doesn't compare to a single, very simple operation.

      – Kaii
      Apr 12 at 22:31












    • @Kaii "loose" index scan means you have to go through the dataset (which can be in RAM but doesn't have to be) every time you want the data, or you perform a simple lookup and a simple operation that takes less CPU time and incurs less I/O wait. Bottom line being that you obtain the data faster every time you need it (are there messages or not) opposed to counting every single record every time you want a yes/no. For some reason, you can't seem to grasp that simple optimization step. I can't explain it easier, I doubt you're even reading.

      – Mjh
      2 days ago














    3












    3








    3







    To have the information that someone has new messages - do exactly that. Update the field in users table (I'm assuming that's the name) when a new message is recorded in the system. You have the recipient's ID, that's all you need. You can create an after insert trigger (assumption: there's users2messages table) that updates users table with a boolean flag indicating there's a message.



    This approach is by far faster than counting indexes, be the index primary or secondary. When the user performs an action, you can update the users table with has_messages = 0, when a new message arrives - you update the table with has_messages = 1. It's simple, it works, it scales and using triggers to maintain it makes it easy and seamless.
    I'm sure there will be nay-sayers who don't like triggers, you can do it manually at the point of associating a user with a new message.






    share|improve this answer













    To have the information that someone has new messages - do exactly that. Update the field in users table (I'm assuming that's the name) when a new message is recorded in the system. You have the recipient's ID, that's all you need. You can create an after insert trigger (assumption: there's users2messages table) that updates users table with a boolean flag indicating there's a message.



    This approach is by far faster than counting indexes, be the index primary or secondary. When the user performs an action, you can update the users table with has_messages = 0, when a new message arrives - you update the table with has_messages = 1. It's simple, it works, it scales and using triggers to maintain it makes it easy and seamless.
    I'm sure there will be nay-sayers who don't like triggers, you can do it manually at the point of associating a user with a new message.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Apr 8 at 20:56









    MjhMjh

    2,18511113




    2,18511113












    • triggers aside, looking up a row using the PK and also reading it to check the boolean is still more expensive than executing a single loose index scan. It gets worse when you also add a WHERE clause to check the boolean flag because of the low cardinality even if you index that field. Sorry to tell you you that, but you have a misunderstanding there.

      – Kaii
      Apr 8 at 21:01












    • @Mjh i know about that.. but it's definitely more expensive than my suggested methods, because it contains (at least) 1x update + 1x select

      – FeHora
      Apr 8 at 21:23






    • 3





      @Kaii SELECT has_messages FROM users WHERE id = 1; is the fastest query there is. It's an eq_ref which is infinitely faster than counting a number of records in the table. The boolean field is not in the WHERE clause, the primary key is. Please, assume better next time. In regards to updating the table: the update is fast as well, it handles a single row located using the primary key. If the field is already containing the value that you're updating to, no actual disk I/O occurs and there's a minimal performance penalty. Much less than counting the records. You can measure.

      – Mjh
      Apr 8 at 21:36












    • Well ... Taking into account that we're talking about [1 trigger including 1 lookup and 1 update to set the flag + 1 lookup and 1 update to unset the flag] vs [1 loose index scan], i think it's obvious what's more overhead. But sure, you can measure. ;-) You are right that eq_ref is the fastest kind of lookup, but doing it four times including two updates just doesn't compare to a single, very simple operation.

      – Kaii
      Apr 12 at 22:31












    • @Kaii "loose" index scan means you have to go through the dataset (which can be in RAM but doesn't have to be) every time you want the data, or you perform a simple lookup and a simple operation that takes less CPU time and incurs less I/O wait. Bottom line being that you obtain the data faster every time you need it (are there messages or not) opposed to counting every single record every time you want a yes/no. For some reason, you can't seem to grasp that simple optimization step. I can't explain it easier, I doubt you're even reading.

      – Mjh
      2 days ago


















    • triggers aside, looking up a row using the PK and also reading it to check the boolean is still more expensive than executing a single loose index scan. It gets worse when you also add a WHERE clause to check the boolean flag because of the low cardinality even if you index that field. Sorry to tell you you that, but you have a misunderstanding there.

      – Kaii
      Apr 8 at 21:01












    • @Mjh i know about that.. but it's definitely more expensive than my suggested methods, because it contains (at least) 1x update + 1x select

      – FeHora
      Apr 8 at 21:23






    • 3





      @Kaii SELECT has_messages FROM users WHERE id = 1; is the fastest query there is. It's an eq_ref which is infinitely faster than counting a number of records in the table. The boolean field is not in the WHERE clause, the primary key is. Please, assume better next time. In regards to updating the table: the update is fast as well, it handles a single row located using the primary key. If the field is already containing the value that you're updating to, no actual disk I/O occurs and there's a minimal performance penalty. Much less than counting the records. You can measure.

      – Mjh
      Apr 8 at 21:36












    • Well ... Taking into account that we're talking about [1 trigger including 1 lookup and 1 update to set the flag + 1 lookup and 1 update to unset the flag] vs [1 loose index scan], i think it's obvious what's more overhead. But sure, you can measure. ;-) You are right that eq_ref is the fastest kind of lookup, but doing it four times including two updates just doesn't compare to a single, very simple operation.

      – Kaii
      Apr 12 at 22:31












    • @Kaii "loose" index scan means you have to go through the dataset (which can be in RAM but doesn't have to be) every time you want the data, or you perform a simple lookup and a simple operation that takes less CPU time and incurs less I/O wait. Bottom line being that you obtain the data faster every time you need it (are there messages or not) opposed to counting every single record every time you want a yes/no. For some reason, you can't seem to grasp that simple optimization step. I can't explain it easier, I doubt you're even reading.

      – Mjh
      2 days ago

















    triggers aside, looking up a row using the PK and also reading it to check the boolean is still more expensive than executing a single loose index scan. It gets worse when you also add a WHERE clause to check the boolean flag because of the low cardinality even if you index that field. Sorry to tell you you that, but you have a misunderstanding there.

    – Kaii
    Apr 8 at 21:01






    triggers aside, looking up a row using the PK and also reading it to check the boolean is still more expensive than executing a single loose index scan. It gets worse when you also add a WHERE clause to check the boolean flag because of the low cardinality even if you index that field. Sorry to tell you you that, but you have a misunderstanding there.

    – Kaii
    Apr 8 at 21:01














    @Mjh i know about that.. but it's definitely more expensive than my suggested methods, because it contains (at least) 1x update + 1x select

    – FeHora
    Apr 8 at 21:23





    @Mjh i know about that.. but it's definitely more expensive than my suggested methods, because it contains (at least) 1x update + 1x select

    – FeHora
    Apr 8 at 21:23




    3




    3





    @Kaii SELECT has_messages FROM users WHERE id = 1; is the fastest query there is. It's an eq_ref which is infinitely faster than counting a number of records in the table. The boolean field is not in the WHERE clause, the primary key is. Please, assume better next time. In regards to updating the table: the update is fast as well, it handles a single row located using the primary key. If the field is already containing the value that you're updating to, no actual disk I/O occurs and there's a minimal performance penalty. Much less than counting the records. You can measure.

    – Mjh
    Apr 8 at 21:36






    @Kaii SELECT has_messages FROM users WHERE id = 1; is the fastest query there is. It's an eq_ref which is infinitely faster than counting a number of records in the table. The boolean field is not in the WHERE clause, the primary key is. Please, assume better next time. In regards to updating the table: the update is fast as well, it handles a single row located using the primary key. If the field is already containing the value that you're updating to, no actual disk I/O occurs and there's a minimal performance penalty. Much less than counting the records. You can measure.

    – Mjh
    Apr 8 at 21:36














    Well ... Taking into account that we're talking about [1 trigger including 1 lookup and 1 update to set the flag + 1 lookup and 1 update to unset the flag] vs [1 loose index scan], i think it's obvious what's more overhead. But sure, you can measure. ;-) You are right that eq_ref is the fastest kind of lookup, but doing it four times including two updates just doesn't compare to a single, very simple operation.

    – Kaii
    Apr 12 at 22:31






    Well ... Taking into account that we're talking about [1 trigger including 1 lookup and 1 update to set the flag + 1 lookup and 1 update to unset the flag] vs [1 loose index scan], i think it's obvious what's more overhead. But sure, you can measure. ;-) You are right that eq_ref is the fastest kind of lookup, but doing it four times including two updates just doesn't compare to a single, very simple operation.

    – Kaii
    Apr 12 at 22:31














    @Kaii "loose" index scan means you have to go through the dataset (which can be in RAM but doesn't have to be) every time you want the data, or you perform a simple lookup and a simple operation that takes less CPU time and incurs less I/O wait. Bottom line being that you obtain the data faster every time you need it (are there messages or not) opposed to counting every single record every time you want a yes/no. For some reason, you can't seem to grasp that simple optimization step. I can't explain it easier, I doubt you're even reading.

    – Mjh
    2 days ago






    @Kaii "loose" index scan means you have to go through the dataset (which can be in RAM but doesn't have to be) every time you want the data, or you perform a simple lookup and a simple operation that takes less CPU time and incurs less I/O wait. Bottom line being that you obtain the data faster every time you need it (are there messages or not) opposed to counting every single record every time you want a yes/no. For some reason, you can't seem to grasp that simple optimization step. I can't explain it easier, I doubt you're even reading.

    – Mjh
    2 days ago












    -2














    If you need to know the number of new messages then using
    Select count(*) from Messages where user_id in (sender, recipient) and id > last_seen_id would be your best option.



    I'm a fan of using exists where possible, so to determine IF there are new messages, my query would be Select exists(Select 1 from Messages where user_id in (sender, recipient) and id > last_seen_id). The benefit of exists is that as soon as it finds 1 record it returns true.



    Edit: To avoid any confusion in reading this answer, both of those queries would also include a check for other_user_id in (sender, recipient) in order to only return the messages between 2 specific users.






    share|improve this answer





























      -2














      If you need to know the number of new messages then using
      Select count(*) from Messages where user_id in (sender, recipient) and id > last_seen_id would be your best option.



      I'm a fan of using exists where possible, so to determine IF there are new messages, my query would be Select exists(Select 1 from Messages where user_id in (sender, recipient) and id > last_seen_id). The benefit of exists is that as soon as it finds 1 record it returns true.



      Edit: To avoid any confusion in reading this answer, both of those queries would also include a check for other_user_id in (sender, recipient) in order to only return the messages between 2 specific users.






      share|improve this answer



























        -2












        -2








        -2







        If you need to know the number of new messages then using
        Select count(*) from Messages where user_id in (sender, recipient) and id > last_seen_id would be your best option.



        I'm a fan of using exists where possible, so to determine IF there are new messages, my query would be Select exists(Select 1 from Messages where user_id in (sender, recipient) and id > last_seen_id). The benefit of exists is that as soon as it finds 1 record it returns true.



        Edit: To avoid any confusion in reading this answer, both of those queries would also include a check for other_user_id in (sender, recipient) in order to only return the messages between 2 specific users.






        share|improve this answer















        If you need to know the number of new messages then using
        Select count(*) from Messages where user_id in (sender, recipient) and id > last_seen_id would be your best option.



        I'm a fan of using exists where possible, so to determine IF there are new messages, my query would be Select exists(Select 1 from Messages where user_id in (sender, recipient) and id > last_seen_id). The benefit of exists is that as soon as it finds 1 record it returns true.



        Edit: To avoid any confusion in reading this answer, both of those queries would also include a check for other_user_id in (sender, recipient) in order to only return the messages between 2 specific users.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Apr 9 at 3:35

























        answered Apr 9 at 3:30









        AaronAaron

        377




        377





















            -2














            @FeHora You talk about not using keys to save db space. The table designs wastes more db space.



            ID - Auto Increment, Primary Key (Bigint)


            Is bigint really necessary? Let us assume, the a message is send every second. The a int unsigned is enough for 126 years. And if you have really so much messages, a key is mandatory.



            Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table
            Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table


            Why not using the UserID (usually an int unsigned).



            Then I would add a seen flags. Btw, you can add for all filed the attribute not null.



            seen tinyint not NULL.


            Last not least I recomment the variant of @Mjh : Define a flag has_messages, or new_messages, or both in the user record. Usually, the user record is loaded so it is NOT an additional database query.






            share|improve this answer


















            • 1





              This messaging system is for a government-ish organization, 90% of messages are sent to users from systems (like temperature in room is above 30C ..etc etc).. It can generate millions of messages per hour, that's why i need to optimize every microsecond of server time. I cannot use UserID key because of reverse engineering + GDPR (EU thing). Long story short - i need to have everything encrypted and fast. every additional data field can cause a lot of extra unwanted database storage space.

              – FeHora
              Apr 9 at 6:16











            • @FeHora if what you wrote is true, then the accepted answer is exactly what you want to avoid. Million records per hour is only 278 inserts per second. Even old mechanical drivers were able to pull of ~400 IOPS, current SSDs are starting at 5k IOPS and getting 250k IOPS drive is not expensive any more. If it's a government asset, I take it you won't run it on a Raspberry Pi but a server with sufficient RAM and storage (128GB of RAM, a few TB of SSD). That just means that your microoptimizations aren't worth it. However, suggesting a varchar(32) key for a foreign key is.. just bad.

              – Mjh
              Apr 9 at 11:14












            • why @Mjh ? the change (if have new mail) is written only once (in end-user android app cookies), so it's not torturing the database system/performance. Only one select until the user opening new messages tab. The app has mobile notifications and works only in local area (intranet app). So the accepted answer is exactly that costs minimal server resources. Now i have 800+ users logged in and DB server/web server load is ~2% . I am using failover , so the data must be shipped to backup server also, in real time. 2% is really not too much.

              – FeHora
              Apr 9 at 11:21











            • @FeHora because it's not the fastest solution. Your load will remain low, but the conclusion that the chosen method if the best because server load is low is false conclusion. Currently, you're unaware if you're I/O or CPU bound (you'd be I/O bound, 99.9% of DB operations are I/O bound operations). Designing your database while avoiding foreign key constraints is awful and proof that premature optimization is the root of all evil. You chose to have inconsistent data under pretense of performance. You never measured what your server can do and where it shows signs of slowing down.

              – Mjh
              Apr 9 at 11:26












            • It's clear that you're conscious about what you're doing, but you went about it entirely wrong. Neither will one server be sufficient, nor can you choose to leave features out and ignore consistency because you think it contributes to downgraded performance. Even now, when your thing is running - you saw that your load is abysmal.

              – Mjh
              Apr 9 at 11:27















            -2














            @FeHora You talk about not using keys to save db space. The table designs wastes more db space.



            ID - Auto Increment, Primary Key (Bigint)


            Is bigint really necessary? Let us assume, the a message is send every second. The a int unsigned is enough for 126 years. And if you have really so much messages, a key is mandatory.



            Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table
            Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table


            Why not using the UserID (usually an int unsigned).



            Then I would add a seen flags. Btw, you can add for all filed the attribute not null.



            seen tinyint not NULL.


            Last not least I recomment the variant of @Mjh : Define a flag has_messages, or new_messages, or both in the user record. Usually, the user record is loaded so it is NOT an additional database query.






            share|improve this answer


















            • 1





              This messaging system is for a government-ish organization, 90% of messages are sent to users from systems (like temperature in room is above 30C ..etc etc).. It can generate millions of messages per hour, that's why i need to optimize every microsecond of server time. I cannot use UserID key because of reverse engineering + GDPR (EU thing). Long story short - i need to have everything encrypted and fast. every additional data field can cause a lot of extra unwanted database storage space.

              – FeHora
              Apr 9 at 6:16











            • @FeHora if what you wrote is true, then the accepted answer is exactly what you want to avoid. Million records per hour is only 278 inserts per second. Even old mechanical drivers were able to pull of ~400 IOPS, current SSDs are starting at 5k IOPS and getting 250k IOPS drive is not expensive any more. If it's a government asset, I take it you won't run it on a Raspberry Pi but a server with sufficient RAM and storage (128GB of RAM, a few TB of SSD). That just means that your microoptimizations aren't worth it. However, suggesting a varchar(32) key for a foreign key is.. just bad.

              – Mjh
              Apr 9 at 11:14












            • why @Mjh ? the change (if have new mail) is written only once (in end-user android app cookies), so it's not torturing the database system/performance. Only one select until the user opening new messages tab. The app has mobile notifications and works only in local area (intranet app). So the accepted answer is exactly that costs minimal server resources. Now i have 800+ users logged in and DB server/web server load is ~2% . I am using failover , so the data must be shipped to backup server also, in real time. 2% is really not too much.

              – FeHora
              Apr 9 at 11:21











            • @FeHora because it's not the fastest solution. Your load will remain low, but the conclusion that the chosen method if the best because server load is low is false conclusion. Currently, you're unaware if you're I/O or CPU bound (you'd be I/O bound, 99.9% of DB operations are I/O bound operations). Designing your database while avoiding foreign key constraints is awful and proof that premature optimization is the root of all evil. You chose to have inconsistent data under pretense of performance. You never measured what your server can do and where it shows signs of slowing down.

              – Mjh
              Apr 9 at 11:26












            • It's clear that you're conscious about what you're doing, but you went about it entirely wrong. Neither will one server be sufficient, nor can you choose to leave features out and ignore consistency because you think it contributes to downgraded performance. Even now, when your thing is running - you saw that your load is abysmal.

              – Mjh
              Apr 9 at 11:27













            -2












            -2








            -2







            @FeHora You talk about not using keys to save db space. The table designs wastes more db space.



            ID - Auto Increment, Primary Key (Bigint)


            Is bigint really necessary? Let us assume, the a message is send every second. The a int unsigned is enough for 126 years. And if you have really so much messages, a key is mandatory.



            Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table
            Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table


            Why not using the UserID (usually an int unsigned).



            Then I would add a seen flags. Btw, you can add for all filed the attribute not null.



            seen tinyint not NULL.


            Last not least I recomment the variant of @Mjh : Define a flag has_messages, or new_messages, or both in the user record. Usually, the user record is loaded so it is NOT an additional database query.






            share|improve this answer













            @FeHora You talk about not using keys to save db space. The table designs wastes more db space.



            ID - Auto Increment, Primary Key (Bigint)


            Is bigint really necessary? Let us assume, the a message is send every second. The a int unsigned is enough for 126 years. And if you have really so much messages, a key is mandatory.



            Sender - Varchar (32) // Foreign Key to UserID hash from Users DB Table
            Recipient - Varchar (32) // Foreign Key to UserID hash from Users DB Table


            Why not using the UserID (usually an int unsigned).



            Then I would add a seen flags. Btw, you can add for all filed the attribute not null.



            seen tinyint not NULL.


            Last not least I recomment the variant of @Mjh : Define a flag has_messages, or new_messages, or both in the user record. Usually, the user record is loaded so it is NOT an additional database query.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Apr 9 at 5:52









            WiimmWiimm

            967516




            967516







            • 1





              This messaging system is for a government-ish organization, 90% of messages are sent to users from systems (like temperature in room is above 30C ..etc etc).. It can generate millions of messages per hour, that's why i need to optimize every microsecond of server time. I cannot use UserID key because of reverse engineering + GDPR (EU thing). Long story short - i need to have everything encrypted and fast. every additional data field can cause a lot of extra unwanted database storage space.

              – FeHora
              Apr 9 at 6:16











            • @FeHora if what you wrote is true, then the accepted answer is exactly what you want to avoid. Million records per hour is only 278 inserts per second. Even old mechanical drivers were able to pull of ~400 IOPS, current SSDs are starting at 5k IOPS and getting 250k IOPS drive is not expensive any more. If it's a government asset, I take it you won't run it on a Raspberry Pi but a server with sufficient RAM and storage (128GB of RAM, a few TB of SSD). That just means that your microoptimizations aren't worth it. However, suggesting a varchar(32) key for a foreign key is.. just bad.

              – Mjh
              Apr 9 at 11:14












            • why @Mjh ? the change (if have new mail) is written only once (in end-user android app cookies), so it's not torturing the database system/performance. Only one select until the user opening new messages tab. The app has mobile notifications and works only in local area (intranet app). So the accepted answer is exactly that costs minimal server resources. Now i have 800+ users logged in and DB server/web server load is ~2% . I am using failover , so the data must be shipped to backup server also, in real time. 2% is really not too much.

              – FeHora
              Apr 9 at 11:21











            • @FeHora because it's not the fastest solution. Your load will remain low, but the conclusion that the chosen method if the best because server load is low is false conclusion. Currently, you're unaware if you're I/O or CPU bound (you'd be I/O bound, 99.9% of DB operations are I/O bound operations). Designing your database while avoiding foreign key constraints is awful and proof that premature optimization is the root of all evil. You chose to have inconsistent data under pretense of performance. You never measured what your server can do and where it shows signs of slowing down.

              – Mjh
              Apr 9 at 11:26












            • It's clear that you're conscious about what you're doing, but you went about it entirely wrong. Neither will one server be sufficient, nor can you choose to leave features out and ignore consistency because you think it contributes to downgraded performance. Even now, when your thing is running - you saw that your load is abysmal.

              – Mjh
              Apr 9 at 11:27












            • 1





              This messaging system is for a government-ish organization, 90% of messages are sent to users from systems (like temperature in room is above 30C ..etc etc).. It can generate millions of messages per hour, that's why i need to optimize every microsecond of server time. I cannot use UserID key because of reverse engineering + GDPR (EU thing). Long story short - i need to have everything encrypted and fast. every additional data field can cause a lot of extra unwanted database storage space.

              – FeHora
              Apr 9 at 6:16











            • @FeHora if what you wrote is true, then the accepted answer is exactly what you want to avoid. Million records per hour is only 278 inserts per second. Even old mechanical drivers were able to pull of ~400 IOPS, current SSDs are starting at 5k IOPS and getting 250k IOPS drive is not expensive any more. If it's a government asset, I take it you won't run it on a Raspberry Pi but a server with sufficient RAM and storage (128GB of RAM, a few TB of SSD). That just means that your microoptimizations aren't worth it. However, suggesting a varchar(32) key for a foreign key is.. just bad.

              – Mjh
              Apr 9 at 11:14












            • why @Mjh ? the change (if have new mail) is written only once (in end-user android app cookies), so it's not torturing the database system/performance. Only one select until the user opening new messages tab. The app has mobile notifications and works only in local area (intranet app). So the accepted answer is exactly that costs minimal server resources. Now i have 800+ users logged in and DB server/web server load is ~2% . I am using failover , so the data must be shipped to backup server also, in real time. 2% is really not too much.

              – FeHora
              Apr 9 at 11:21











            • @FeHora because it's not the fastest solution. Your load will remain low, but the conclusion that the chosen method if the best because server load is low is false conclusion. Currently, you're unaware if you're I/O or CPU bound (you'd be I/O bound, 99.9% of DB operations are I/O bound operations). Designing your database while avoiding foreign key constraints is awful and proof that premature optimization is the root of all evil. You chose to have inconsistent data under pretense of performance. You never measured what your server can do and where it shows signs of slowing down.

              – Mjh
              Apr 9 at 11:26












            • It's clear that you're conscious about what you're doing, but you went about it entirely wrong. Neither will one server be sufficient, nor can you choose to leave features out and ignore consistency because you think it contributes to downgraded performance. Even now, when your thing is running - you saw that your load is abysmal.

              – Mjh
              Apr 9 at 11:27







            1




            1





            This messaging system is for a government-ish organization, 90% of messages are sent to users from systems (like temperature in room is above 30C ..etc etc).. It can generate millions of messages per hour, that's why i need to optimize every microsecond of server time. I cannot use UserID key because of reverse engineering + GDPR (EU thing). Long story short - i need to have everything encrypted and fast. every additional data field can cause a lot of extra unwanted database storage space.

            – FeHora
            Apr 9 at 6:16





            This messaging system is for a government-ish organization, 90% of messages are sent to users from systems (like temperature in room is above 30C ..etc etc).. It can generate millions of messages per hour, that's why i need to optimize every microsecond of server time. I cannot use UserID key because of reverse engineering + GDPR (EU thing). Long story short - i need to have everything encrypted and fast. every additional data field can cause a lot of extra unwanted database storage space.

            – FeHora
            Apr 9 at 6:16













            @FeHora if what you wrote is true, then the accepted answer is exactly what you want to avoid. Million records per hour is only 278 inserts per second. Even old mechanical drivers were able to pull of ~400 IOPS, current SSDs are starting at 5k IOPS and getting 250k IOPS drive is not expensive any more. If it's a government asset, I take it you won't run it on a Raspberry Pi but a server with sufficient RAM and storage (128GB of RAM, a few TB of SSD). That just means that your microoptimizations aren't worth it. However, suggesting a varchar(32) key for a foreign key is.. just bad.

            – Mjh
            Apr 9 at 11:14






            @FeHora if what you wrote is true, then the accepted answer is exactly what you want to avoid. Million records per hour is only 278 inserts per second. Even old mechanical drivers were able to pull of ~400 IOPS, current SSDs are starting at 5k IOPS and getting 250k IOPS drive is not expensive any more. If it's a government asset, I take it you won't run it on a Raspberry Pi but a server with sufficient RAM and storage (128GB of RAM, a few TB of SSD). That just means that your microoptimizations aren't worth it. However, suggesting a varchar(32) key for a foreign key is.. just bad.

            – Mjh
            Apr 9 at 11:14














            why @Mjh ? the change (if have new mail) is written only once (in end-user android app cookies), so it's not torturing the database system/performance. Only one select until the user opening new messages tab. The app has mobile notifications and works only in local area (intranet app). So the accepted answer is exactly that costs minimal server resources. Now i have 800+ users logged in and DB server/web server load is ~2% . I am using failover , so the data must be shipped to backup server also, in real time. 2% is really not too much.

            – FeHora
            Apr 9 at 11:21





            why @Mjh ? the change (if have new mail) is written only once (in end-user android app cookies), so it's not torturing the database system/performance. Only one select until the user opening new messages tab. The app has mobile notifications and works only in local area (intranet app). So the accepted answer is exactly that costs minimal server resources. Now i have 800+ users logged in and DB server/web server load is ~2% . I am using failover , so the data must be shipped to backup server also, in real time. 2% is really not too much.

            – FeHora
            Apr 9 at 11:21













            @FeHora because it's not the fastest solution. Your load will remain low, but the conclusion that the chosen method if the best because server load is low is false conclusion. Currently, you're unaware if you're I/O or CPU bound (you'd be I/O bound, 99.9% of DB operations are I/O bound operations). Designing your database while avoiding foreign key constraints is awful and proof that premature optimization is the root of all evil. You chose to have inconsistent data under pretense of performance. You never measured what your server can do and where it shows signs of slowing down.

            – Mjh
            Apr 9 at 11:26






            @FeHora because it's not the fastest solution. Your load will remain low, but the conclusion that the chosen method if the best because server load is low is false conclusion. Currently, you're unaware if you're I/O or CPU bound (you'd be I/O bound, 99.9% of DB operations are I/O bound operations). Designing your database while avoiding foreign key constraints is awful and proof that premature optimization is the root of all evil. You chose to have inconsistent data under pretense of performance. You never measured what your server can do and where it shows signs of slowing down.

            – Mjh
            Apr 9 at 11:26














            It's clear that you're conscious about what you're doing, but you went about it entirely wrong. Neither will one server be sufficient, nor can you choose to leave features out and ignore consistency because you think it contributes to downgraded performance. Even now, when your thing is running - you saw that your load is abysmal.

            – Mjh
            Apr 9 at 11:27





            It's clear that you're conscious about what you're doing, but you went about it entirely wrong. Neither will one server be sufficient, nor can you choose to leave features out and ignore consistency because you think it contributes to downgraded performance. Even now, when your thing is running - you saw that your load is abysmal.

            – Mjh
            Apr 9 at 11:27

















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55581114%2fcount-or-maxid-which-is-faster%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Crop image to path created in TikZ? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Crop an inserted image?TikZ pictures does not appear in posterImage behind and beyond crop marks?Tikz picture as large as possible on A4 PageTransparency vs image compression dilemmaHow to crop background from image automatically?Image does not cropTikzexternal capturing crop marks when externalizing pgfplots?How to include image path that contains a dollar signCrop image with left size given

            រឿង រ៉ូមេអូ និង ហ្ស៊ុយលីយេ សង្ខេបរឿង តួអង្គ បញ្ជីណែនាំ

            Ромео және Джульетта Мазмұны Қысқаша сипаттамасы Кейіпкерлері Кино Дереккөздер Бағыттау мәзірі