Make組ブログ

Python、Webアプリや製品・サービス開発についてhirokikyが書きます。

matcha0.1リリースのお知らせとその背景

matcha0.1リリースのお知らせとその背景

matcha という WSGI dispatcher をリリースしました。 WSGIのライブラリ/ミドルウェアで、PATH_INFOを考慮した WSGIアプリケーションの呼び出しを主目的にしています。

この記事では matcha の紹介を軽くしつつ、 後半では本音として書きたかった dispatcher を実装して思ったことを書きます。

matchaの利点

matcha が売りにできるのは Djangoのurls.pyっぽく書ける の、1点ですね。 基本的にはこんなかんじで記述します:

>>> from matcha import Matching as m, bundle
>>> 
>>> from yourproject import home_app
>>> from yourproject.blog import post_list_app, post_detail_app
>>> 
>>> matching = bundle(
...     m('/', home_app, 'home'),
...     m('/post/', post_list_app, 'post_list'),
...     m('/post/{post_slug}/', post_detail_app, 'post_detail'),
... )

bundleとかMatchingとかちょっと名前が違うくらいですね。

他にも matcha は以下の点において便利です:

  • 上記のように宣言的に書けるし、手続き的にも書ける
  • 名前(上記の例だと'post_listなど')からURLの逆引きができる
  • WSGIアプリケーションの呼び出し意外にも使える

機能的に matcha は他のdispatcherと大して変わらないです。 他にもいくつか dispatcher ありますが、 matcha を除いた中では WebDispatchが良さげです:

dispatcherを4つくらい作って思った

公開したもの2つ、公開してないもので2、3 dispatcher を書いてみた印象としては URL での dispatch は分割して用意したほうが良い ということです。

URL での dispatch にはある程度持っておきたい機能があります:

  • environのPATH_INFO, SCRIPT_NAMEに副作用を起こす
  • URLからの引数取得
  • 名前/対象アプリケーションから URL の逆引き

これらの機能を実現しつつ、「URL にとらわれない柔軟な dispatcher」を作るのは 難しかったです(詳細は後述)。

matcha では(少なくとも 0.1 リリースでは) URL からのマッチングのみ 提供し、それ意外の条件での dispatch は利用者や matcha を使ったWebフレームワーク の開発者に提供してもらうことにします。

route_nameを噛ませればPyramidのように使えますし、URLでのdispatchしか提供しないので あればDjangoのようになります。

matchaと私とときどきgargant.dispatch

つい1ヶ月ほど前に gargant.dispach を リリースし、 PyCon APAC 2013 のLTでも紹介したばかり ですが、 matcha を書き始めました。その経緯など。

gargant.dispatch はかなり優秀で実装も面白いのですが、柔軟にしすぎようとしたせいもあり 以下の点で URL dispatcher として劣っていました。:

  • PATH_INFO, SCRIPT_NAMEに副作用を起こさない
  • 名前/対象アプリケーションから URL の逆引きができない
  • URLから引数を取るのが面倒

先ほども少し触れたところです。

これは gargant.dispatch が「matchingというものを PATH_INFO や REQUEST_METHODのみを 対象にしない」という考えのもとに作られているので「URLの逆引き」のような PATH_INFO に必ず 依存する実装が持たせにくいというものでした。

もちろんやりようによってはできると思いますが、柔軟性を維持しつつ そういった制約(WSGIの仕様やWebアプリケーションでよく使われる機能)への対応 を入れるのは意外と難しかった(だるかった)です。

まぁ gargant.dispatch もかなり良い勉強になったのでこのまま廃れてもまぁいいかなという思いです。 パッケージにおける gargant 以下は実験的なWebフレームワークを作るための場所としていますし。 ただ実験的なものだけじゃなくて、実用的なものもちゃんと作っておきたいところなので、 matcha は集大成でもありますね。

テンプレートエンジンでも作るか

と思っています。 そもそも私は別に「dispatcherを書きたいオジサン」というわけでなく、 Webアプリケーションにおいてサーバーサイドで必要なものを下から見た場合に dispatcherがあったということです。すでにWebサーバーは書き散らして飽きて、 Request/Responseオブジェクトはだるかったので飽きてます。

なので次はテンプレートエンジンでも作ってみようかなと思っています。

Introducing django-websettings

Introducing django-websettings

I released new package named django-websettings. This is django’s third party application to provide a web interface to set a yet another django’s settings.

If you are similar to read English, please read the README of django-websettings.

I write in Japanene from here.

django-websettingsの紹介

私のpoorなGithub Englishが好きな人は今すぐREADMEから読んでください。

django-websettingsはDjangoサードパーティアプリケーションで、 Webインタフェースから設定可能なsettings.pyのようなものを提供するものです。

ユーザーさんにちゃっちゃと手直しして欲しいような値を、Webインタフェースから 入力して貰えたらなと思って作った。インストール方法などもREADMEを見てほしい。

基本的には予め指定したwebsettings.pyというファイルに以下のように書く:

# In websettings.py
DRUM = 'Ritsu Tainaka'
BASS = 'Mio Akiyama'

値は websettings から取れる。

>>> from websettings import websettings
>>> websettings.BASS
'Mio Akiyama'

websettings.pyを書くと自動でWebインタフェースが作られて、これらの設定値を入力できる。

image

こんなかんじで書き換える(ベースを純ちゃんにした)

image

これでsubmitしてからもう一度シェルを叩くと、値が変わっている:

>>> websettings.BASS
'Jun Suzuki'

というもの。 詳しくは README みてください

Tips: for Initial arg of Django's Form fields

Tips: for Initial arg of Django’s Form fields

Django’s Form fields take a initial argument to specify the initial value for that Field.

For more detail about initial, check out the doc Form fields #initial

Basically…

As written there, if you want to set a initial value as a result of a callable, you should pass the callable directly to the argument. like this:

class DateTimeForm(forms.Form):
     now = forms.DateTimeField(initial=datetime.datetime.now)

With arguments

Mistake

Let’s consider the case passing some argument to callable to pass to initial. A common mistake is like this:

# Don't do this
class DateTimeForm(forms.Form):
     now = forms.DateTimeField(initial=datetime.datetime.now(tz))

In this case, the now value would have been fixed at the time you start the server. You should pass a callabse directary, how do you pass arguments to the callable?

Elegant solution

Let’s put together the calling with partial function application. The following is a simple example using functools.partial.

>>> from functools import partial
>>>
>>> zero_until_ten = partial(range, 10)
>>> zero_until_ten()
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Yes, you made a callable without arguments. This technique can be used in the previous example, like this:

class DateTimeForm(forms.Form):
     now = forms.DateTimeField(initial=partial(datetime.datetime.now, tz))

The initial will be passed a callable, so the resulting value will not be fixed.

It’s beautiful.

Inelegant solution

Alternatively, you can write like this:

class DateTimeForm(forms.Form):
     now = forms.DateTimeField()

    def __init__(self, *args, **kwargs):
        super(DateTimeForm, self).__init__(*args, **kwargs)
        self.fields.get('now')._set_choices(datetime.datetime.now(tz))

The beauty declarative was lost.

Another develoaper must read __init__, and you should be careful not to forget the calling super.

Not recommended. Use the functools.partial function.

Search view with long long parameters

Search view with long long parameters

At most of web application have the so-called search view. The view get some request parameters and display some search results by using these parameters.

A search view does not cause any side effects, so it should expect a GET request. The URL will be like this:

/search?code=R100100&code=R1001001

and the view will be like this:

def search_studends_view(request):
    f = SearchForm(data=request.GET)

    students = None
    if f.is_valid():
        codes = f.cleaned_data['code']
        students = get_students(codes=codes)

    return HttpResponse(",".join(students))

(This is a sample code, so I have not verified the correctness of it)

In most cases, the GET search view seems good and enough. But unfortunately, it have some limitations.

With long long parameter

Let’s consider a searching by using very long parameters. If you should create a view that can be OR search by 1,000 birthdays…

The URL will be like this:

/search?code=R100100&code=R100101&code=R100102&code=R102000&code=...

Then, what would happen? (You may have a doubt the requirement itself)

The header of this request will be very long too. so, It will couse “Request URI too large (414)”.

Gunicorn’s maxcimum size of a request line is 4094 bytes by default.

At Nginx, it is 8k bytes.

If you use reverse proxying, you also should consider settings proxy_buffers, proxy_buffer_size.

だるい。

With POST

Another solution, you can use POST request with search view.

But, if you make the honest, you will run into trouble caused browser history backing. If you back to POST result, most browser will raise warning (like ‘Web page has expired’).

A re-POSTing is handled as bad process to protect such as multiple registration. Also, the POST results will not be cached, because a POST request cause side effect in general. (Some browsers like Chrome, Firefox will cache it. but IE will not so even if you provide Cache-Controll header)

To avoid this, you should redirect to a GET result page after POSTed. POSTed parameter will store to session, and GET page display by using it.

like this:

def search_studends_view(request):
    if request.method == 'POST':
        request.session['search_params'] = request.POST
        return HttpResponseRedirect('/search')
    else:
        try:
            params = request.session['search_params']
        except KeyError:
            params = {}
        f = SearchForm(data=params)

        students = None
        if f.is_valid():
            codes = f.cleaned_data['code']
            students = get_students(codes=codes)

        return HttpResponse(",".join(students))

(Yes, this is a just sample code)

It is not beautiful.

If you want to paginate these results you shuould get a page number from GET parameter (puke).

And more

Serializing and compressing GET parameters on client side may also be good solution.

Don't format strings before logging in python

Don’t format strings before logging in python

You should not provide formatted string to loggers in Python, like this:

logger.info('Logged in: %s' % username)

You should write like this:

logger.info('Logged in: %s', username)

Why?

In mont cases, it is the same as a result. But, internally, the later one contains more information which part of this string represents a username.

You shoud realize the first argument (message) can also be used as a signature.

If you want to aggregate logs, you will group logs by messages, like this:

  • message: ‘Logged in: %s’, args: (‘Ritsu Tainaka’,)
  • message: ‘Logged in: %s’, args: (‘Mio Akiyama’,)

Yes, you will be able to group these logs. They are same log, just username is different.

OK, now consider this grouping with formatted strings, it will be not work:

  • message: ‘Logged in: Ritsu Tainaka’, args: ()
  • message: ‘Logged in: Mio Akiyama’, args: ()

They will be handled as different. Of cause, these messages are totally different.

Practical

Sentry, error logging and aggregation platform, it displays logs grouping by these messages.

So, If you use Sentry, you should provide not formatted message to any loggers. Without this, all logs containing some variables will be handled as different. Yes, as thousands of different logs.

Django's TestCase.multi\_db attribute is mistake

Django’s TestCase.multi_db attribute is mistake

Django’s test framework (django.test.TestCase) has a atribute multi_db . It should be set True when testing on multiple databases (False, by default).

If you forget this setting, and your test uses multiple databases, the test suite only flushes the ‘default’ database (more exact django.db.utils.DEFAULT_DB_ALIAS) without flushing another databases.

And then, some trash datas will left on these databases (ignore the ‘default’ database). After tests will have risk of failing in absurd reason. I’m handling multiple databases on my work, and every time I forget setting this. And the mistake is difficlut to notice. Of cause, there are no errors. This is expected behavior.

This behavior (only flushing ‘default’) is feature for speeding up. Because, flushing a database is slow, and it will be run for each tests.

But, I think, it should be flush all databases by default.

If your application using multiple databases, but a TestCase use only ‘default’, then you can set ‘flush_only_default = True’ (for example) to force a test suite flushing only ‘default’ database.

Of cause, your application uses only ‘default’ database, only ‘default’ will be flushed even if without setting flush_only_default = True.

I think the behabior for speeding up is optional. By default, it should be performed as indubitable even if it will be slow.

The change for this will be like this:

diff --git a/django/test/testcases.py b/django/test/testcases.py
index a9fcc2b..0558476 100644
--- a/django/test/testcases.py
+++ b/django/test/testcases.py
@@ -466,11 +466,11 @@ class TransactionTestCase(SimpleTestCase):
     def _databases_names(self, include_mirrors=True):
         # If the test case has a multi_db=True flag, act on all databases,
         # including mirrors or not. Otherwise, just on the default DB.
-        if getattr(self, 'multi_db', False):
+        if getattr(self, 'flush_only_default', False):
+            return [DEFAULT_DB_ALIAS]
+        else:
             return [alias for alias in connections
                     if include_mirrors or not connections[alias].settings_dict[
-        else:
-            return [DEFAULT_DB_ALIAS]

     def _reset_sequences(self, db_name):
         conn = connections[db_name]

I asked about this proposal on django developers IRC channel (#django-dev on freenode). Some people answered me (Thanks a lot!), in side disagree. Certainly, this proposal is not have great gain. and having a big risk breaking compabitity. I noticed this is not good proposal, but I still claim the multi_db behavior is mistake.

Yes, I will set multi_db = True on a base class, and subclassing it on own tests.

Sentry with django-newauth

Sentry with django-newauth

djangonnewauth is a library to add customizable user model (It developed before Django 1.5). And sentry is a platform to collect logs and aggregate these.

Sentry can collect errors raised by some django applications, and displays informations of these errors.

Problem

Sentry can tell us a user encountered some errors. But, if the application uses djangonnewauth, the user provided by newauth will not be displayed.

I tried to write a SentryPlugin to display the newauth’s user information. As it turns out, I could not do it. It was a difficult than I thought.

Solution

I wrote a package to solve this raven_django_newauth .

You can set SENTRY_CLIENT = ‘raven_django_newauth.client.DjangoNewauthClient’, and then, User interface of Sentry provides a information of newauth’s user.

Sentry and Raven

Sentry collects informations sended by raven. Sentry defines general interfaces (for example a User), and raven traslates collected data to thing in consideration of these interfaces. A client send dictionary thats key is path to each interfaces (like ‘setry.interfaces.User).

The User interfaces in sentry.interfaces.User. And on django application, the raven client is raven.contrib.django.client.DjnagoClient. The client gets user information from get_user_info method and stores to the dictionary as a key ‘sentry.interfaces.User’