Blog de Javier Smaldone

Todos los días se aprende algo viejo

16 comentarios

Florian dijo:
30 de Diciembre de 2008 a las 11:20  

Just use a real database then, smth. like Postgres or Oracle etc.

Florian dijo:
30 de Diciembre de 2008 a las 11:20  

Everybody who ever touched mysql knows it’s a glorified textfile editor.

Keith dijo:
30 de Diciembre de 2008 a las 11:32  

It is generally bad practice to use SELECT * in any query. In some RDBMS, the columns specified in the SELECT are considered by the optimizer when it identifies indexes for execution plans. I know this is a fact with SQL Server. I would retry your query only specifying the columns that are needed by the application and see if the MySQL optimizer comes up with a different execution plan.

Martin dijo:
30 de Diciembre de 2008 a las 12:15  

I’m not very familiar with the MySQL optimizer, but in general, an optimizer can’t do more than doing the most obvious thing in a particluar situation and not ‘try’ several scenarios and select the best. In your specific example, it is very possible that, when selecting an entire table, it is faster to do a sequential sort instead of using an index, because indices in general works best when only a subset of a table is selected. PostgreSQL does it that way, so I wouldn’t be surprised if MySQL does the same. It will always be a challenge to find the best optimization by tuning the configuration of your RDBMS. It is a bit harsh to call MySQL a ‘toy RDBMS’, based on one single issue you experience. PostgreSQL has similar issues (in one case a query performed 100 times faster when we added a ‘limit 1′ to a query which returned one record anyway). I bet Oracle and SQL Server has similar issues. The cause is that optimizers are not hard science and the results may vary in particular scenarios. You better get used to it, otherwise you will experience a lot of frustration in the future.

Jakub dijo:
30 de Diciembre de 2008 a las 12:53  

To answer your question – no, MySQL is not a toy. I would suggest that you have a read through section 7.4.5 of the manual as to why you are seeing this particular behaviour (http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html):

“Sometimes MySQL will not use an index, even if one is available. One way this occurs is when the optimizer estimates that using the index would require MySQL to access a large percentage of the rows in the table. (In this case, a table scan is probably much faster, because it will require many fewer seeks.) However, if such a query uses LIMIT to only retrieve part of the rows, MySQL will use an index anyway, because it can much more quickly find the few rows to return in the result.

Hash indexes have somewhat different characteristics than those just discussed:
* They are used only for = or comparisons (but are very fast).
* The optimizer cannot use a hash index to speed up ORDER BY operations. (This type
of index cannot be used to search for the next entry in order.)
* MySQL cannot determine approximately how many rows there are between two values (this is used by the range optimizer to decide which index to use). This may affect some queries if you change a MyISAM table to a hash-indexed MEMORY table.
* Only whole keys can be used to search for a row. (With a B-tree index, any leftmost prefix of the key can be used to find rows.)”

I would suggest that sorting 100000 rows would probably in the manual’s terms “require MySQL to access a large percentage of the rows in the table”. Have a read through the order-by optimization section of the manual and you will probably find a way to instruct it to do what you want (http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html).

30 de Diciembre de 2008 a las 13:02  

I’ve ran into the same kind of issues with MySQL. But I’ve also had problems with PostgreSQL. As other comments point out, query optimizers sometimes behave in an unexpected way.

I wouldn’t call MySQL a toy RDBMS for that reason. Not handling relationships between tables unless you use InnoDB (which doesn’t support other stuff, like full-text search) is a better reason.

What would be interesting to see is a documentation for all reported shortcomings of various RDBMS and their solutions.

Javier dijo:
30 de Diciembre de 2008 a las 13:03  

Jokub:

I agree with you: sorting 1M (all) rows requires “to access a large percentage of the rows in the table”. But anyway, how can you explain the fact that the query takes 3 secs if doesn’t involves the last row and 1 minute if it does? (despite of the filesort).

I must insist: If I force the optimizer to use the index, the query performs well.

Scott Swank dijo:
30 de Diciembre de 2008 a las 13:43  

I suspect that you are not retrieving all 1,000,000 rows when you specify “select * from … order by …”. Of course it is faster to fetch the top N rows in some client application by index, however it would be much slower to fetch all of the rows in the table via this index. So in effect you are hiding information from the optimizer. You only want the top N rows (i.e. the ones you are going to work with), however you do not tell it this crucial detail.

To get an accurate sense of what is going on, try each approach (full table scan and index read) in a situation where you consume all of the rows, say by an insert/select statement that puts all of the data into a separate table. You will see that getting all of the rows by index is much slower.

The above is not specific to MySQL, but applies to any well-designed rdbms optimizer.

Scott Swank dijo:
30 de Diciembre de 2008 a las 13:45  

I do, however, agree that this sounds like a bug in the optimizer:

“But the problem is still there: If the limit reaches (or surpases) the last row, then the optimizer throws away the index and generates a filesort (taking more than 1 minute, making the query totally useless…)”

sam pho dijo:
30 de Diciembre de 2008 a las 16:46  

you’re fired.. go and read the different philosophies of database, and come back and give a write up on something useful like why and what the difference between databases like oracle/postgress and mysql are, and explain why one philosophy differs from the other. Then come back, then maybe the people you work with can slap you around for being a moron.

Charles dijo:
30 de Diciembre de 2008 a las 16:47  

No, MySQL isn’t a toy… but it’s not a serious tool either. It still can be the right tool for the job, just as long as the people using the tool understand the problems they’ll face.

The optimizer is horrid, and the query planner is a joke. Correlated subqueries, anybody? The most complex query in my employer’s codebase is only so complex because it’s designed to work around how utterly braindead the MySQL query planner is. I’ve even encountered times when FORCE INDEX() won’t actually force the use of that index, if the optimizer thinks that using another index will be faster.

Honestly though, if it wasn’t for the out-of-the-box it-just-works replication, I’d have switched to something else ages ago.

MFR dijo:
31 de Diciembre de 2008 a las 20:53  

Actually, retrieving 1 million record query is already a bad practise to begin with.

Mark dijo:
14 de Abril de 2009 a las 16:44  

Hi,
I don’t know if I’ll be able to link here, but Shlomi Noach goes into some detail on this issue – and in his case he’s using a WHERE clause to restrict the number of rows, yet the optimizer isn’t using the index as the ORDER BY clause invokes the primary key. So it’s not just the “select *” part which is causing problems.

http://code.openark.org/blog/mysql/7-ways-to-convince-mysql-to-use-the-right-index

Once you’re aware of this as a developer or administrator, you can take relevant action. I’m still happy to use Mysql on my sites; well until most hosts provide PostgreSQL. Having used both Oracle and SQL*Server in the past, I’m well aware that there are gotchas and glitches with all of them. Well, ok, too many on Mysql, but with luck that’ll provide interesting subjects for my blog.

Saludos desde Inglaterra

DBMark

flony dijo:
29 de Octubre de 2009 a las 18:06  

I put my itune info in iphone but I want to remove it. I am afraid if someone just download any expensive application or songs. I want to put itune info only when I want.

________________
unlock iphone

Jeff dijo:
6 de Marzo de 2010 a las 22:49  

Thank you for these comments. I was having the same problem. I have only a few thousand records and it blows chunks even with an index. It seems to not be fetching in the background and returning the first row immediately, like Oracle does when it uses nested loops.

For users who just want to feel like the database is responsive, the first page needs to come back fast, as well as paginate quickly on through. Its counterintuitive for it to be otherwise. The first page of an encyclopedia is just as accessible as the first page of a baby book.

Pingback & Trackback
mygif
3 de Junio de 2009 a las 18:42  

[...] in mass adoption over time, which in turn forced the once web bound database to evolve into a near fully fledged enterprise [...]

Artículo al azar

Deja tu comentario: