| 371 | | |
|---|
| 372 | | Sun Microsystems, a neutral vendor selling hardware that runs many database types, has recently submitted test results on the well regulated SPECjAppServer2004 using both PostgreSQL and MySQL. There are just enough hardware differences between the two systems that it isn't fair to directly compare the two results. But the fact that both scores are close to one another and the configuration is similar does suggest that while there may be performance differences between the two database systems, the magnitude of that difference is not particularly large with this application type. |
|---|
| 373 | | |
|---|
| 374 | | For comparison sake, an Oracle on HP result offers a similar magnitude of performance on less impressive hardware, suggesting both open-source databases still lag the best of the commercial products in absolute performance efficiency. Some suggest Oracle's lead is even larger if you pick examples to put it in a better light, but be sure to read Benchmark Brou-Ha-Ha for some comments on actual pricing here (and to pick up some comments on a second PostgreSQL result using a smaller server). Note that Josh Berkus is a Sun employee whose role there includes being a member of the PostgreSQL Core Team, and his commentary should be evaluated accordingly. |
|---|
| 375 | | |
|---|
| 376 | | If you do a fair comparison that includes software licensing costs, the performance per dollar figures for both PostgreSQL and MySQL are both similar to one another and very good relative to the average for the database industry. It would however be untrue to say that these open-source solutions are always a better choice than commercial offerings like Oracle just based on that; certainly the feature sets and absolute performance of each solution need to be considered as well. |
|---|
| | 371 | Résultats des tests de performances de Sun Microsystems 2007 jAppServer2004 |
|---|
| | 372 | |
|---|
| | 373 | // Sun Microsystems, a neutral vendor selling hardware that runs many database types, has recently submitted test |
|---|
| | 374 | // results on the well regulated SPECjAppServer2004 using both PostgreSQL and MySQL. There are just enough hardware |
|---|
| | 375 | // differences between the two systems that it isn't fair to directly compare the two results. But the fact that both |
|---|
| | 376 | // scores are close to one another and the configuration is similar does suggest that while there may be performance |
|---|
| | 377 | // differences between the two database systems, the magnitude of that difference is not particularly large with this |
|---|
| | 378 | // application type. |
|---|
| | 379 | Sun Microsystems, un constructeur neutre vendant du matériel qui supporte beaucoup de bases de données différentes, a |
|---|
| | 380 | récement publié les résultats de ses tests sur le trés réglementé SPECjAppServer2004 en utilisant PostgreSQL et MySQL. |
|---|
| | 381 | Le peu de différences matériels entre les deux systèmes suffit pour ne pas comparer les deux résultats directement. |
|---|
| | 382 | Mais le fait que les deux résultats soient assé proches avec une configuration similaire suggère que malgrès que les |
|---|
| | 383 | performances puissent être différentes entre les deux bases, l'importance de cette différence n'est pas particulièrement |
|---|
| | 384 | grande avec ce type d'application. |
|---|
| | 385 | |
|---|
| | 386 | // For comparison sake, an Oracle on HP result offers a similar magnitude of performance on less impressive hardware, |
|---|
| | 387 | // suggesting both open-source databases still lag the best of the commercial products in absolute performance |
|---|
| | 388 | // efficiency. Some suggest Oracle's lead is even larger if you pick examples to put it in a better light, but be sure |
|---|
| | 389 | // to read Benchmark Brou-Ha-Ha for some comments on actual pricing here (and to pick up some comments on a second |
|---|
| | 390 | // PostgreSQL result using a smaller server). Note that Josh Berkus is a Sun employee whose role there includes being |
|---|
| | 391 | // a member of the PostgreSQL Core Team, and his commentary should be evaluated accordingly. |
|---|
| | 392 | |
|---|
| | 393 | // If you do a fair comparison that includes software licensing costs, the performance per dollar figures for both |
|---|
| | 394 | // PostgreSQL and MySQL are both similar to one another and very good relative to the average for the database |
|---|
| | 395 | // industry. It would however be untrue to say that these open-source solutions are always a better choice than |
|---|
| | 396 | // commercial offerings like Oracle just based on that; certainly the feature sets and absolute performance of each |
|---|
| | 397 | // solution need to be considered as well. |
|---|
| 380 | | PostgreSQL uses a robust locking model called |
|---|
| 381 | | MVCC that limits situations where individual clients interfere with each other. A short summary of the main benefit of MVCC would be "readers are never blocked by writers". MVCC is used to implement a pessimistic implementation of the four SQL standard transaction isolation levels: "when you select the level Read Uncommitted you really get Read Committed, and when you select Repeatable Read you really get Serializable, so the actual isolation level may be stricter than what you select." The default transaction isolation level is "read committed". |
|---|
| 382 | | |
|---|
| 383 | | MySQL's InnoDB implements MVCC using a rollback segment, inspired by Oracle's design; their new Falcon engine works similarly. InnoDB databases supports all four SQL standard transaction isolation levels, with the default being "repeatable read". |
|---|
| 384 | | |
|---|
| 385 | | When comparing the two models, PostgreSQL enforces client separation where the data operated on is always consistent under all circumstances; as the MVCC documentation states, "the reason that PostgreSQL only provides two isolation levels is that this is the only sensible way to map the standard isolation levels to the multiversion concurrency control architecture." MySQL allows configurations where client code that doesn't commit transactions properly can result in a data view that would be considered inconsistent by PostgreSQL's stricter standards. However, in situations where it's acceptable for data being read to have small inconsistencies, being able to use a less strict locking could be a performance advantage for MySQL. |
|---|
| 386 | | |
|---|
| 387 | | Even when both systems are configured to one of the strict levels of transaction locking, the differences between the two implementations are subtle enough that which implementation will work better for a particular application is hard to state definitively. Recommended reading to understand this complicated topic is |
|---|
| 388 | | "Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control" by Weikum & Vossen. Speaking in the terminology used there, PostgreSQL uses multi-version timestamp ordering (MVTO) while InnoDB and Oracle use multi-version read consistency (MVRC). The main difference is that PostgreSQL is with-REDO/no-UNDO because it stores every row version in the main table, while Oracle/InnoDB implements with-REDO/with-UNDO where they reconstruct a block and/or row image from the log to provide read consistency. If you're willing to consider a third architecture, that of IBM's DB2, as a comparison point additional good references on this topic are A not-so-very technical discussion of Multi Version Concurrency Control and Leverage your PostgreSQL V8.1 skills to learn DB2. IBM is clearly not a fan of the MVCC approach. |
|---|
| 389 | | |
|---|
| 390 | | Partially because the PostgreSQL locking implementation is very mature (it's always active and performance of the associated code is accordingly critical), even in situations where MySQL initially appears faster PostgreSQL can pull ahead and scale to higher throughput when the number of simultaneous users becomes large. A good example of such a situation is demonstrated in the tweakers.net database test. |
|---|
| | 401 | //PostgreSQL uses a robust locking model called MVCC that limits situations where individual clients interfere with |
|---|
| | 402 | // each other. A short summary of the main benefit of MVCC would be "readers are never blocked by writers". MVCC is |
|---|
| | 403 | // used to implement a pessimistic implementation of the four SQL standard transaction isolation levels: "when you |
|---|
| | 404 | // select the level Read Uncommitted you really get Read Committed, and when you select Repeatable Read you really |
|---|
| | 405 | // get Serializable, so the actual isolation level may be stricter than what you select." The default transaction |
|---|
| | 406 | // isolation level is "read committed". |
|---|
| | 407 | |
|---|
| | 408 | // MySQL's InnoDB implements MVCC using a rollback segment, inspired by Oracle's design; their new Falcon engine works |
|---|
| | 409 | // similarly. InnoDB databases supports all four SQL standard transaction isolation levels, with the default being |
|---|
| | 410 | // "repeatable read". |
|---|
| | 411 | |
|---|
| | 412 | // When comparing the two models, PostgreSQL enforces client separation where the data operated on is always consistent |
|---|
| | 413 | // under all circumstances; as the MVCC documentation states, "the reason that PostgreSQL only provides two isolation |
|---|
| | 414 | // levels is that this is the only sensible way to map the standard isolation levels to the multiversion concurrency |
|---|
| | 415 | // control architecture." MySQL allows configurations where client code that doesn't commit transactions properly can |
|---|
| | 416 | // result in a data view that would be considered inconsistent by PostgreSQL's stricter standards. However, in |
|---|
| | 417 | // situations where it's acceptable for data being read to have small inconsistencies, being able to use a less strict |
|---|
| | 418 | // locking could be a performance advantage for MySQL. |
|---|
| | 419 | |
|---|
| | 420 | // Even when both systems are configured to one of the strict levels of transaction locking, the differences between |
|---|
| | 421 | // the two implementations are subtle enough that which implementation will work better for a particular application |
|---|
| | 422 | // is hard to state definitively. Recommended reading to understand this complicated topic is |
|---|
| | 423 | // "Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control" by Weikum & Vossen. |
|---|
| | 424 | // Speaking in the terminology used there, PostgreSQL uses multi-version timestamp ordering (MVTO) while InnoDB and |
|---|
| | 425 | // Oracle use multi-version read consistency (MVRC). The main difference is that PostgreSQL is with-REDO/no-UNDO |
|---|
| | 426 | // because it stores every row version in the main table, while Oracle/InnoDB implements with-REDO/with-UNDO where they |
|---|
| | 427 | // reconstruct a block and/or row image from the log to provide read consistency. If you're willing to consider a third |
|---|
| | 428 | // architecture, that of IBM's DB2, as a comparison point additional good references on this topic are A not-so-very |
|---|
| | 429 | // technical discussion of Multi Version Concurrency Control and Leverage your PostgreSQL V8.1 skills to learn DB2. IBM |
|---|
| | 430 | // is clearly not a fan of the MVCC approach. |
|---|
| | 431 | |
|---|
| | 432 | // Partially because the PostgreSQL locking implementation is very mature (it's always active and performance of the |
|---|
| | 433 | // associated code is accordingly critical), even in situations where MySQL initially appears faster PostgreSQL can |
|---|
| | 434 | // pull ahead and scale to higher throughput when the number of simultaneous users becomes large. A good example of |
|---|
| | 435 | // such a situation is demonstrated in the tweakers.net database test. |
|---|
| 398 | | The reason why this is slow is related to the MVCC implementation in PostgreSQL. The fact that multiple transactions can see different states of the data means that there can be no straightforward way for "COUNT(*)" to summarize data across the whole table; PostgreSQL must walk through all rows, in some sense. This normally results in a sequential scan reading information about every row in the table. |
|---|
| 399 | | |
|---|
| 400 | | Some DBMSes provide the ability for "COUNT(*)" queries to work via consulting an index. Unfortunately, in PostgreSQL, this strategy does not work, as MVCC visibility information is not stored at the index level. It is necessary to actually examine the rows themselves to determine if they are visible to the transaction or not. |
|---|
| 401 | | |
|---|
| 402 | | In MySQL, MyISAM tables cache the row count information, making this type of count operation almost instant. That is the reason why there exists so much MySQL code that uses this construct assuming it's a trivial operation. But if you're using InnoDB instead, this is no longer the case. See |
|---|
| 403 | | COUNT(*) for Innodb Tables and COUNT(*) vs COUNT(col) for notes on the limitations of MySQL in this area. MySQL designs that may be deployed on InnoDB can't assume that a full row count will be fast, and therefore are hampered by similar limitations to those present in PostgreSQL. |
|---|
| 404 | | |
|---|
| 405 | | It is worth observing that it is only this precise form of aggregate that must be so pessimistic; if augmented with a "WHERE" clause like |
|---|
| | 446 | // The reason why this is slow is related to the MVCC implementation in PostgreSQL. The fact that multiple |
|---|
| | 447 | // transactions can see different states of the data means that there can be no straightforward way for "COUNT(*)" to |
|---|
| | 448 | // summarize data across the whole table; PostgreSQL must walk through all rows, in some sense. This normally results |
|---|
| | 449 | // in a sequential scan reading information about every row in the table. |
|---|
| | 450 | |
|---|
| | 451 | // Some DBMSes provide the ability for "COUNT(*)" queries to work via consulting an index. Unfortunately, in |
|---|
| | 452 | // PostgreSQL, this strategy does not work, as MVCC visibility information is not stored at the index level. It is |
|---|
| | 453 | // necessary to actually examine the rows themselves to determine if they are visible to the transaction or not. |
|---|
| | 454 | |
|---|
| | 455 | // In MySQL, MyISAM tables cache the row count information, making this type of count operation almost instant. That |
|---|
| | 456 | // is the reason why there exists so much MySQL code that uses this construct assuming it's a trivial operation. But |
|---|
| | 457 | // if you're using InnoDB instead, this is no longer the case. See "COUNT(*) for Innodb Tables" and |
|---|
| | 458 | // "COUNT(*) vs COUNT(col)" for notes on the limitations of MySQL in this area. MySQL designs that may be deployed |
|---|
| | 459 | // on InnoDB can't assume that a full row count will be fast, and therefore are hampered by similar limitations to |
|---|
| | 460 | // those present in PostgreSQL. |
|---|
| | 461 | |
|---|
| | 462 | // It is worth observing that it is only this precise form of aggregate that must be so pessimistic; if augmented with |
|---|
| | 463 | // a "WHERE" clause like |
|---|