データベースアクセスの最適化¶

Django's database layer provides various ways to help developers get the most out of their databases. This document gathers together links to the relevant documentation, and adds various tips, organized under a number of headings that outline the steps to take when attempting to optimize your database usage.

Profile first¶

一般的なプログラミング手法と同様に、これは言うまでもないことです。どんなクエリを実行し何がコストなのかを判別してください。QuerySet.explain() を使用し、データベース上で特定の QuerySet がどのように実行されるかを理解してください。また、django-debug-toolbar といった外部のプロジェクトや、データベースを直接監視するツールを使うのもいいでしょう。

要件に従って、速度またはメモリ、およびその両方を最適化することができます。片方を最適化することは、もう片方に悪影響を及ぼすことがありますが、互いに助けになることもあります。また、データベースプロセスによって行われる処理と Python のプロセスによる処理は (あなたにとって) 必ずしも同等のコストとはなりません。その優先順位とバランスを決めるのはあなた自身です。そして、その設計はアプリケーションやサーバーに依存するため、要求通りに設計するのもあなたの仕事です。

以下で紹介する項目すべてにおいて、あらゆる変更の後に忘れずに分析を行い、施した変更が有益だったこと、およびその恩恵が可読性の低下を十分上回ることを確認してください。以下の すべて の提案において、一般的な原則があなたの状況に当てはまらない可能性があること、それどころか逆効果になりかねない可能性さえあることに十分注意してください。

標準的な DB 最適化のテクニックを使う¶

以下のようなものが上げられます:

Index。これは最優先事項です。ただし、どのインデックスを追加すべきか分析して決定した後の話しです。Field.db_index や Meta.index_together を使って、Django からインデックスを追加してください。インデックスは検索の高速化に役立つので、filter()、exclude()、order_by() などを使うクエリを頻繁に呼び出すフィールドにインデックスを追加することを検討してください。気をつけるべきは、最善のインデックス付けを決めるのは、アプリケーションに特化した、データベース依存の、込み入ったトピックだということです。インデックスの維持にかかるオーバーヘッドは、速度向上よりも大きな負荷になる可能性があります。

フィールドタイプの適切な使用。

以降は、上記の明確な対処は実行済みだという前提で進めていきます。このドキュメントの残りの部分では、不要な作業をしなくて済むように、Django をどのように使えばいいかを中心に説明します。またこのドキュメントでは、汎用キャッシュのような高コストな最適化手法については説明しません。

`QuerySet` を理解する¶

QuerySet を理解することは、シンプルなコードでパフォーマンスを上げるために極めて重要です。特に:

`QuerySet` の評価を理解する¶

パフォーマンスの問題を回避するには、以下を理解することが重要です:

キャッシュされる属性を理解する¶

QuerySet 全体をキャッシュすることに加えて、ORM オブジェクトの属性の結果のキャッシングも存在します。通常、カラブルではない属性はキャッシュされます。例として、Weblog モデルの例を考えてみましょう:

>>> entry = Entry.objects.get(id=1)
>>> entry.blog   # Blog object is retrieved at this point
>>> entry.blog   # cached version, no DB access

その一方で、通常カラブルの属性は毎回 DB の検索を引き起こします:

>>> entry = Entry.objects.get(id=1)
>>> entry.authors.all()   # query performed
>>> entry.authors.all()   # query performed again

テンプレート上のコードを読む際には注意が必要です - テンプレートシステムは括弧を許容していませんが、カラブルは自動的に呼び出されるので、上記の区別が隠れてしまいます。

独自のプロパティにも注意が必要です - 必要なときにキャッシングを実装するのはあなた次第です。たとえば cached_property デコレータを使用します。

`with` テンプレートタグを使用する¶

QuerySet のキャッシング処理を活用するため、with テンプレートタグの使用が推奨されます。

`iterator()` を使用する¶

多くのオブジェクトを扱う際には、QuerySet のキャッシング動作に多くのメモリが使われる可能性があります。この場合、iterator() が有用です。

`explain()` を使用する¶

QuerySet.explain() を使うと、使用されているインデックスや結合など、データベースがクエリをどのように実行しているのか、詳細な情報を得られます。この詳細情報は、より効率的になるようにクエリを書き換えたり、パフォーマンスを向上させるために追加できるインデックスを特定するのに役立ちます。

データベースの仕事を Python ではなくデータベースに行わせる¶

例えば:

最も基本的なレベルでは、filter や exclude を使ってデータベース内でフィルタリングを行います。
F expressions を使い、同一モデル内で他のフィールドに基づくフィルタリングを行います。
データベース内の集計をするためアノテーションを使います。

必要な SQL を生成するのに不十分な場合は:

`RawSQL` を使用する¶

保守性は高くありませんが、より強力な方法は RawSQL 表現です。これにより、SQL を明示的にクエリに追加することができます。これでもまだ不十分な場合は:

素の SQL を使用する¶

モデルの取り出しおよび書き込みをするための独自の SQL を記述します。django.db.connection.queries を使い、Django があなたのために何を書いているのかを理解して、それを元に始めてください。

ユニークかつインデックス済みの列を使用して個別のオブジェクトを取得する¶

get() を使って個別オブジェクトを取得する際に、unique や db_index が設定された列を使用するのには 2 つの理由があります。1 つは、データベースインデックスにより受け里が高速になるからです。加えて、複数のオブジェクトが検索にマッチするとクエリは遅くなります; 列にユニーク制限をかけることでこれを完全に防ぐことができます。

したがって、Weblog モデルの例を使うと:

>>> entry = Entry.objects.get(id=10)

上記は以下よりも高速です:

>>> entry = Entry.objects.get(headline="News Item Title")

これは、id がデータベースによってインデックス化されていて、ユニークだと保証されているからです。

以下のようにすると非常に遅くなる恐れがあります:

>>> entry = Entry.objects.get(headline__startswith="News")

まず第一に、headline はインデックス化されておらず、データベースのデータ取り出しを遅くします。

そして第二に、この検索では単一のオブジェクトが返されることは保証されません。クエリが 1 つ以上のオブジェクトと一致する場合、すべてのオブジェクトをデータベースから取り出して転送します。この余分な負荷は、100 とか 1000 といった多量のレコードが返されるときには相当な量になります。データベースが複数のサーバーによって構成される場合、ネットワークのオーバーヘッドと待ち時間が発生するため、この負荷はさらに大きくなります。

必要なものが分かっているときは一度にすべてを取り出す¶

すべての部分を必要とする単一のデータセットの異なる部分に対してデータベースを何度もヒットするのは、一般的に、1 つのクエリですべてを取得するよりも非効率です。これは、1 つのクエリだけが必要なときにループ内でクエリを実行し、その結果何度もデータベースクエリを実行することになってしまう場合に、特に重要となります。そこで:

必要ないものを取り出さない¶

`QuerySet.values()` や `values_list()` を使用する¶

単に値の dict や list がほしいだけで ORM モデルオブジェクトが必要ないときは、values() を適切に使用してください。テンプレートのコード内で、モデルオブジェクトを置き換えるのに役立ちます - 辞書がテンプレートで使われているものと同じ属性を持っている限り問題ありません。

`QuerySet.defer()` や `only()` を使用する¶

Use defer() and only() if there are database columns you know that you won't need (or won't need in most cases) to avoid loading them. Note that if you do use them, the ORM will have to go and get them in a separate query, making this a pessimization if you use it inappropriately.

Also, be aware that there is some (small extra) overhead incurred inside Django when constructing a model with deferred fields. Don't be too aggressive in deferring fields without profiling as the database has to read most of the non-text, non-VARCHAR data from the disk for a single row in the results, even if it ends up only using a few columns. The defer() and only() methods are most useful when you can avoid loading a lot of text data or for fields that might take a lot of processing to convert back to Python. As always, profile first, then optimize.

Use `QuerySet.count()`¶

...if you only want the count, rather than doing len(queryset).

Use `QuerySet.exists()`¶

...if you only want to find out if at least one result exists, rather than if queryset.

But:

Don't overuse `count()` and `exists()`¶

If you are going to need other data from the QuerySet, just evaluate it.

For example, assuming an Email model that has a body attribute and a many-to-many relation to User, the following template code is optimal:

{% if display_inbox %}
  {% with emails=user.emails.all %}
    {% if emails %}
      <p>You have {{ emails|length }} email(s)</p>
      {% for email in emails %}
        <p>{{ email.body }}</p>
      {% endfor %}
    {% else %}
      <p>No messages today.</p>
    {% endif %}
  {% endwith %}
{% endif %}

It is optimal because:

Since QuerySets are lazy, this does no database queries if 'display_inbox' is False.
Use of with means that we store user.emails.all in a variable for later use, allowing its cache to be re-used.
The line {% if emails %} causes QuerySet.__bool__() to be called, which causes the user.emails.all() query to be run on the database, and at the least the first line to be turned into an ORM object. If there aren't any results, it will return False, otherwise True.
The use of {{ emails|length }} calls QuerySet.__len__(), filling out the rest of the cache without doing another query.
The for loop iterates over the already filled cache.

In total, this code does either one or zero database queries. The only deliberate optimization performed is the use of the with tag. Using QuerySet.exists() or QuerySet.count() at any point would cause additional queries.

Use `QuerySet.update()` and `delete()`¶

Rather than retrieve a load of objects, set some values, and save them individual, use a bulk SQL UPDATE statement, via QuerySet.update(). Similarly, do bulk deletes where possible.

Note, however, that these bulk update methods cannot call the save() or delete() methods of individual instances, which means that any custom behavior you have added for these methods will not be executed, including anything driven from the normal database object signals.

Use foreign key values directly¶

If you only need a foreign key value, use the foreign key value that is already on the object you've got, rather than getting the whole related object and taking its primary key. i.e. do:

entry.blog_id

instead of:

entry.blog.id

Don't order results if you don't care¶

Ordering is not free; each field to order by is an operation the database must perform. If a model has a default ordering (Meta.ordering) and you don't need it, remove it on a QuerySet by calling order_by() with no parameters.

Adding an index to your database may help to improve ordering performance.

Insert in bulk¶

When creating objects, where possible, use the bulk_create() method to reduce the number of SQL queries. For example:

Entry.objects.bulk_create([
    Entry(headline='This is a test'),
    Entry(headline='This is only a test'),
])

...is preferable to:

Entry.objects.create(headline='This is a test')
Entry.objects.create(headline='This is only a test')

Note that there are a number of caveats to this method, so make sure it's appropriate for your use case.

This also applies to ManyToManyFields, so doing:

my_band.members.add(me, my_friend)

...is preferable to:

my_band.members.add(me)
my_band.members.add(my_friend)

...where Bands and Artists have a many-to-many relationship.

データベースアクセスの最適化¶

Profile first¶

標準的な DB 最適化のテクニックを使う¶

`QuerySet` を理解する¶

`QuerySet` の評価を理解する¶

キャッシュされる属性を理解する¶

`with` テンプレートタグを使用する¶

`iterator()` を使用する¶

`explain()` を使用する¶

データベースの仕事を Python ではなくデータベースに行わせる¶

`RawSQL` を使用する¶

素の SQL を使用する¶

ユニークかつインデックス済みの列を使用して個別のオブジェクトを取得する¶

必要なものが分かっているときは一度にすべてを取り出す¶

必要ないものを取り出さない¶

`QuerySet.values()` や `values_list()` を使用する¶

`QuerySet.defer()` や `only()` を使用する¶

Use `QuerySet.count()`¶

Use `QuerySet.exists()`¶

Don't overuse `count()` and `exists()`¶

Use `QuerySet.update()` and `delete()`¶

Use foreign key values directly¶

Don't order results if you don't care¶

Insert in bulk¶

追加的な情報

Support Django!

コンテンツ

助けを求める

ダウンロード:

Diamond and Platinum Members

データベースアクセスの最適化¶

Profile first¶

標準的な DB 最適化のテクニックを使う¶

QuerySet を理解する¶

QuerySet の評価を理解する¶

キャッシュされる属性を理解する¶

with テンプレートタグを使用する¶

iterator() を使用する¶

explain() を使用する¶

データベースの仕事を Python ではなくデータベースに行わせる¶

RawSQL を使用する¶

素の SQL を使用する¶

ユニークかつインデックス済みの列を使用して個別のオブジェクトを取得する¶

必要なものが分かっているときは一度にすべてを取り出す¶

QuerySet.select_related() や prefetch_related() を使用する¶

必要ないものを取り出さない¶

QuerySet.values() や values_list() を使用する¶

QuerySet.defer() や only() を使用する¶

Use QuerySet.count()¶

Use QuerySet.exists()¶

Don't overuse count() and exists()¶

Use QuerySet.update() and delete()¶

Use foreign key values directly¶

Don't order results if you don't care¶

Insert in bulk¶

追加的な情報

Support Django!

コンテンツ

助けを求める

ダウンロード:

Diamond and Platinum Members

`QuerySet` を理解する¶

`QuerySet` の評価を理解する¶

`with` テンプレートタグを使用する¶

`iterator()` を使用する¶

`explain()` を使用する¶

`RawSQL` を使用する¶

`QuerySet.select_related()` や `prefetch_related()` を使用する¶

`QuerySet.values()` や `values_list()` を使用する¶

`QuerySet.defer()` や `only()` を使用する¶

Use `QuerySet.count()`¶

Use `QuerySet.exists()`¶

Don't overuse `count()` and `exists()`¶

Use `QuerySet.update()` and `delete()`¶