Changeset 721

Show
Ignore:
Timestamp:
09/26/07 14:13:45 (1 year ago)
Author:
ioguix
Message:

un peu plus sur le chapitre benchmarks

Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • materials/advocacy/trunk/Why_PostgreSQL_Instead_of_MySQL.txt

    r717 r721  
    369369 
    370370// Sun Microsystems 2007 jAppServer2004 Benchmark Results 
    371  
    372 Sun Microsystems, a neutral vendor selling hardware that runs many database types, has recently submitted test results on the well regulated SPECjAppServer2004 using both PostgreSQL and MySQL. There are just enough hardware differences between the two systems that it isn't fair to directly compare the two results. But the fact that both scores are close to one another and the configuration is similar does suggest that while there may be performance differences between the two database systems, the magnitude of that difference is not particularly large with this application type. 
    373  
    374 For comparison sake, an Oracle on HP result offers a similar magnitude of performance on less impressive hardware, suggesting both open-source databases still lag the best of the commercial products in absolute performance efficiency. Some suggest Oracle's lead is even larger if you pick examples to put it in a better light, but be sure to read Benchmark Brou-Ha-Ha for some comments on actual pricing here (and to pick up some comments on a second PostgreSQL result using a smaller server). Note that Josh Berkus is a Sun employee whose role there includes being a member of the PostgreSQL Core Team, and his commentary should be evaluated accordingly. 
    375  
    376 If you do a fair comparison that includes software licensing costs, the performance per dollar figures for both PostgreSQL and MySQL are both similar to one another and very good relative to the average for the database industry. It would however be untrue to say that these open-source solutions are always a better choice than commercial offerings like Oracle just based on that; certainly the feature sets and absolute performance of each solution need to be considered as well. 
     371Résultats des tests de performances de Sun Microsystems 2007 jAppServer2004 
     372 
     373// Sun Microsystems, a neutral vendor selling hardware that runs many database types, has recently submitted test 
     374// results on the well regulated SPECjAppServer2004 using both PostgreSQL and MySQL. There are just enough hardware 
     375// differences between the two systems that it isn't fair to directly compare the two results. But the fact that both 
     376// scores are close to one another and the configuration is similar does suggest that while there may be performance 
     377// differences between the two database systems, the magnitude of that difference is not particularly large with this 
     378// application type. 
     379Sun Microsystems, un constructeur neutre vendant du matériel qui supporte beaucoup de bases de données différentes, a 
     380récement publié les résultats de ses tests sur le trés réglementé SPECjAppServer2004 en utilisant PostgreSQL et MySQL. 
     381Le peu de différences matériels entre les deux systèmes suffit pour ne pas comparer les deux résultats directement. 
     382Mais le fait que les deux résultats soient assé proches avec une configuration similaire suggère que malgrès que les 
     383performances puissent être différentes entre les deux bases, l'importance de cette différence n'est pas particulièrement 
     384grande avec ce type d'application. 
     385 
     386// For comparison sake, an Oracle on HP result offers a similar magnitude of performance on less impressive hardware, 
     387// suggesting both open-source databases still lag the best of the commercial products in absolute performance 
     388// efficiency. Some suggest Oracle's lead is even larger if you pick examples to put it in a better light, but be sure 
     389// to read Benchmark Brou-Ha-Ha for some comments on actual pricing here (and to pick up some comments on a second 
     390// PostgreSQL result using a smaller server). Note that Josh Berkus is a Sun employee whose role there includes being 
     391// a member of the PostgreSQL Core Team, and his commentary should be evaluated accordingly. 
     392 
     393// If you do a fair comparison that includes software licensing costs, the performance per dollar figures for both 
     394// PostgreSQL and MySQL are both similar to one another and very good relative to the average for the database 
     395// industry. It would however be untrue to say that these open-source solutions are always a better choice than 
     396// commercial offerings like Oracle just based on that; certainly the feature sets and absolute performance of each 
     397// solution need to be considered as well. 
    377398 
    378399// Transaction Locking and Scalability 
    379400 
    380 PostgreSQL uses a robust locking model called 
    381 MVCC that limits situations where individual clients interfere with each other. A short summary of the main benefit of MVCC would be "readers are never blocked by writers". MVCC is used to implement a pessimistic implementation of the four SQL standard transaction isolation levels: "when you select the level Read Uncommitted you really get Read Committed, and when you select Repeatable Read you really get Serializable, so the actual isolation level may be stricter than what you select." The default transaction isolation level is "read committed". 
    382  
    383 MySQL's InnoDB implements MVCC using a rollback segment, inspired by Oracle's design; their new Falcon engine works similarly. InnoDB databases supports all four SQL standard transaction isolation levels, with the default being "repeatable read". 
    384  
    385 When comparing the two models, PostgreSQL enforces client separation where the data operated on is always consistent under all circumstances; as the MVCC documentation states, "the reason that PostgreSQL only provides two isolation levels is that this is the only sensible way to map the standard isolation levels to the multiversion concurrency control architecture." MySQL allows configurations where client code that doesn't commit transactions properly can result in a data view that would be considered inconsistent by PostgreSQL's stricter standards. However, in situations where it's acceptable for data being read to have small inconsistencies, being able to use a less strict locking could be a performance advantage for MySQL. 
    386  
    387 Even when both systems are configured to one of the strict levels of transaction locking, the differences between the two implementations are subtle enough that which implementation will work better for a particular application is hard to state definitively. Recommended reading to understand this complicated topic is 
    388 "Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control" by Weikum & Vossen. Speaking in the terminology used there, PostgreSQL uses multi-version timestamp ordering (MVTO) while InnoDB and Oracle use multi-version read consistency (MVRC). The main difference is that PostgreSQL is with-REDO/no-UNDO because it stores every row version in the main table, while Oracle/InnoDB implements with-REDO/with-UNDO where they reconstruct a block and/or row image from the log to provide read consistency. If you're willing to consider a third architecture, that of IBM's DB2, as a comparison point additional good references on this topic are A not-so-very technical discussion of Multi Version Concurrency Control and Leverage your PostgreSQL V8.1 skills to learn DB2. IBM is clearly not a fan of the MVCC approach. 
    389  
    390 Partially because the PostgreSQL locking implementation is very mature (it's always active and performance of the associated code is accordingly critical), even in situations where MySQL initially appears faster PostgreSQL can pull ahead and scale to higher throughput when the number of simultaneous users becomes large. A good example of such a situation is demonstrated in the tweakers.net database test. 
     401//PostgreSQL uses a robust locking model called MVCC that limits situations where individual clients interfere with 
     402// each other. A short summary of the main benefit of MVCC would be "readers are never blocked by writers". MVCC is 
     403// used to implement a pessimistic implementation of the four SQL standard transaction isolation levels: "when you  
     404// select the level Read Uncommitted you really get Read Committed, and when you select Repeatable Read you really  
     405// get Serializable, so the actual isolation level may be stricter than what you select." The default transaction  
     406// isolation level is "read committed". 
     407 
     408// MySQL's InnoDB implements MVCC using a rollback segment, inspired by Oracle's design; their new Falcon engine works 
     409// similarly. InnoDB databases supports all four SQL standard transaction isolation levels, with the default being 
     410// "repeatable read". 
     411 
     412// When comparing the two models, PostgreSQL enforces client separation where the data operated on is always consistent 
     413// under all circumstances; as the MVCC documentation states, "the reason that PostgreSQL only provides two isolation  
     414// levels is that this is the only sensible way to map the standard isolation levels to the multiversion concurrency  
     415// control architecture." MySQL allows configurations where client code that doesn't commit transactions properly can  
     416// result in a data view that would be considered inconsistent by PostgreSQL's stricter standards. However, in  
     417// situations where it's acceptable for data being read to have small inconsistencies, being able to use a less strict  
     418// locking could be a performance advantage for MySQL. 
     419 
     420// Even when both systems are configured to one of the strict levels of transaction locking, the differences between  
     421// the two implementations are subtle enough that which implementation will work better for a particular application 
     422// is hard to state definitively. Recommended reading to understand this complicated topic is 
     423// "Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control" by Weikum & Vossen. 
     424// Speaking in the terminology used there, PostgreSQL uses multi-version timestamp ordering (MVTO) while InnoDB and  
     425// Oracle use multi-version read consistency (MVRC). The main difference is that PostgreSQL is with-REDO/no-UNDO  
     426// because it stores every row version in the main table, while Oracle/InnoDB implements with-REDO/with-UNDO where they 
     427// reconstruct a block and/or row image from the log to provide read consistency. If you're willing to consider a third  
     428// architecture, that of IBM's DB2, as a comparison point additional good references on this topic are A not-so-very  
     429// technical discussion of Multi Version Concurrency Control and Leverage your PostgreSQL V8.1 skills to learn DB2. IBM 
     430// is clearly not a fan of the MVCC approach. 
     431 
     432// Partially because the PostgreSQL locking implementation is very mature (it's always active and performance of the  
     433// associated code is accordingly critical), even in situations where MySQL initially appears faster PostgreSQL can  
     434// pull ahead and scale to higher throughput when the number of simultaneous users becomes large. A good example of 
     435// such a situation is demonstrated in the tweakers.net database test. 
    391436 
    392437// Counting rows in a table 
    393  
    394 One operation that PostgreSQL is known to be slow performing is doing a full count of rows in a table, typically using this SQL: 
    395  
     438Compter les lignes d'une table 
     439 
     440// One operation that PostgreSQL is known to be slow performing is doing a full count of rows in a table, typically 
     441// using this SQL: 
     442 
     443// SELECT COUNT(*) FROM table 
    396444SELECT COUNT(*) FROM table 
    397445 
    398 The reason why this is slow is related to the MVCC implementation in PostgreSQL. The fact that multiple transactions can see different states of the data means that there can be no straightforward way for "COUNT(*)" to summarize data across the whole table; PostgreSQL must walk through all rows, in some sense. This normally results in a sequential scan reading information about every row in the table. 
    399  
    400 Some DBMSes provide the ability for "COUNT(*)" queries to work via consulting an index. Unfortunately, in PostgreSQL, this strategy does not work, as MVCC visibility information is not stored at the index level. It is necessary to actually examine the rows themselves to determine if they are visible to the transaction or not. 
    401  
    402 In MySQL, MyISAM tables cache the row count information, making this type of count operation almost instant. That is the reason why there exists so much MySQL code that uses this construct assuming it's a trivial operation. But if you're using InnoDB instead, this is no longer the case. See 
    403 COUNT(*) for Innodb Tables and COUNT(*) vs COUNT(col) for notes on the limitations of MySQL in this area. MySQL designs that may be deployed on InnoDB can't assume that a full row count will be fast, and therefore are hampered by similar limitations to those present in PostgreSQL. 
    404  
    405 It is worth observing that it is only this precise form of aggregate that must be so pessimistic; if augmented with a "WHERE" clause like 
     446// The reason why this is slow is related to the MVCC implementation in PostgreSQL. The fact that multiple  
     447// transactions can see different states of the data means that there can be no straightforward way for "COUNT(*)" to 
     448// summarize data across the whole table; PostgreSQL must walk through all rows, in some sense. This normally results 
     449// in a sequential scan reading information about every row in the table. 
     450 
     451// Some DBMSes provide the ability for "COUNT(*)" queries to work via consulting an index. Unfortunately, in 
     452// PostgreSQL, this strategy does not work, as MVCC visibility information is not stored at the index level. It is 
     453// necessary to actually examine the rows themselves to determine if they are visible to the transaction or not. 
     454 
     455// In MySQL, MyISAM tables cache the row count information, making this type of count operation almost instant. That 
     456// is the reason why there exists so much MySQL code that uses this construct assuming it's a trivial operation. But 
     457// if you're using InnoDB instead, this is no longer the case. See "COUNT(*) for Innodb Tables" and  
     458// "COUNT(*) vs COUNT(col)" for notes on the limitations of MySQL in this area. MySQL designs that may be deployed  
     459// on InnoDB can't assume that a full row count will be fast, and therefore are hampered by similar limitations to  
     460// those present in PostgreSQL. 
     461 
     462// It is worth observing that it is only this precise form of aggregate that must be so pessimistic; if augmented with 
     463// a "WHERE" clause like 
    406464 
    407465// SELECT COUNT(*) FROM table WHERE status = 'something'