Django: annotate, values, F Object, distinct

Django 공식문서를 살펴보면

annotate : Annotates each object in the QuerySet with the provided list of query expressions. An expression may be a simple value, a reference to a field on the model (or any related models), or an aggregate expression (averages, sums, etc.) that has been computed over the objects that are related to the objects in the QuerySet. Each argument to annotate() is an annotation that will be added to each object in the QuerySet that is returned. SQL Query의 as와 비슷한 역할을 한다고 생각하면 될 듯 하다.
values : Returns a QuerySet that returns dictionaries, rather than model instances, when used as an iterable. Each of those dictionaries represents an object, with the keys corresponding to the attribute names of model objects. 지정한 필드만 dictionary(key: value) 형태로 리턴된다.

The values() method also takes optional keyword arguments, **expressions, which are passed through to annotate(). values()안에서 annotate()기능을 사용할 수 있다.

>>> from django.db.models.functions import Lower
>>> Blog.objects.values(lower_name=Lower('name'))
<QuerySet [{'lower_name': 'beatles blog'}]>

An aggregate within a values() clause is applied before other arguments within the same values() clause. If you need to group by another value, add it to an earlier values() clause instead. 아래와 같이 계산되는 필드를 group by하기 위해선 values()를 먼저 쓰고, 그 다음 annotate를 사용해 계산한다.

>>> from django.db.models import Count
>>> Blog.objects.values('entry__authors', entries=Count('entry'))
<QuerySet [{'entry__authors': 1, 'entries': 20}, {'entry__authors': 1, 'entries': 13}]>
>>> Blog.objects.values('entry__authors').annotate(entries=Count('entry'))
<QuerySet [{'entry__authors': 1, 'entries': 33}]>

F object : An F() object represents the value of a model field or annotated column. It makes it possible to refer to model field values and perform database operations using them without actually having to pull them out of the database into Python memory. 가장 유용한 개념 중 하나라고 생각한다. 모델 필드 혹은 annotated 컬럼의 값까지 데이터베이스에서 Python 메모리로 끌어낼 필요 없이 모델 필드 값을 참조하고 이를 사용하여 데이터베이스 작업을 수행할 수 있다.

# annotate, values, distinct 활용해보기

import pytz
from dateutil.parser import parse
from django.utils import timezone
from datetime import timedelta
tz = pytz.timezone('Asia/Seoul')
start_date = tz.localize(parse('2019-01-01'))
contents=[11592,11589,12681,12682]

freepass_users = UserVideoLecturePlayLog.objects.filter(
    user_content__content__in=contents, created_at__gte=start_date,
    user_content__parent__content__content_type=6, play_time__gt=timedelta(seconds=0)
    ).annotate(
        user_info_id = F('user_content__order__user_info_id'),
        # annotate를 활용하여 join한 필드 간편하게 사용
        freepass_id = F('user_content__parent__content_id')
    ).values(
        'user_info_id', 'freepass_id'  # values를 활용하여 내가 사용할 필드 data만 가져오기
    ).order_by('user_info_id', 'freepass_id')  # order_by 사용하여 data 정렬

freepass_users.count()
# 253192

# distinct에 사용하는 필드와 그 순서에 따라 data가 달라진다
distinct_user_id = freepass_users.distinct('user_info_id').count()
# 6091
distinct_user_and_freepass = freepass_users.distinct('user_info_id', 'freepass_id').count()
# 6298

유저별 시험 점수 data 만들기

프론트 단의 실수로 유저 데이터가 중복으로 입력되는 경우가 발생했다. 이 때문에 통계 데이터를 만들 때, 중복 제거를 해야만 했다. 작업 중 아래와 같은 에러를 만났다.

# BAD CASE
user_scores = TrialUserAnswer.objects.filter(
        question__area__type=2, question__trial_id=trial_id
    ).distinct('user', 'question', 'answer'
    ).values('user', 'question__area', 'question__area__trial_time'
    ).annotate(
        correct_count = Sum(Case(
            When(answer=F('question__answer'), then=1),
            output_field=IntegerField())),
    ).order_by('user', 'question__area')

NotImplementedError: annotate() + distinct(fields) is not implemented

distinct한 question field를 annotate에서 사용할 수 없다는 에러로 보인다.

이를 해결하고 데이터를 추출하기 위해서 아래와 같은 방법을 사용할 수 있었다. 먼저, distinct + values_list 활용하여 중복 제거한 list 만든다.

distinct는 order_by와 함께 쓴다. 명시적으로 확실히 표현하기 위함. 정렬 후 중복 제거 !

distinct_trial_answers = TrialUserAnswer.objects.filter(
        question__area__type=2, question__trial_id=trial_id
    ).order_by('user', 'question', 'answer'
    ).distinct('user', 'question', 'answer').values_list('id', flat=True)

그리고 distinct_trial_answers에 담긴 id list로 필터링한 유저별 correct_count를 계산한다.

user_scores = TrialUserAnswer.objects.filter(
        id__in = distinct_trial_answers
    ).values('user', 'question__area', 'question__area__trial_time'
    ).annotate(
        correct_count = Sum(Case(
            When(answer=F('question__answer'), then=1),
            output_field=IntegerField())),
    ).order_by('user', 'question__area')

Published 12 Nov 2019

Write posts i learned or thought. Simple code is better.Groot on Twitter