Go to any local attraction and you'll see people taking selfies with statues, towers, all sorts of sights. Once, walking past tourists taking pictures of Greyfriars Bobby in Edinburgh, I reckoned that some of those will be posted on Twitter. I should be able to find these, and similar pictures, automatically. How difficult could it be?
Problem: Porn. If you do a search for tweets containing "selfie" in the raw firehose, then you'll find about it's about 60% Porn. I suspect the equivalent live search on twitter is doing some sort of filtering.
Even apart from the problem of Porn, there is no gaurantee they'll even use the term "selfie", so we need a different kind of filter.
My hypothesis was that the selfies I was after would have one person in the foreground on the bottom left or right, with the object of interest in the background. I coded this up, and left it running for a bit.
Umm, ok.
ขำสัสส pic.twitter.com/aEJcf5iYjR
— ฐานิ (@Purepareal) August 2, 2015
ไม่ตอบไม่ได้แปลว่าหยิ่ง แต่มึงตั้งชื่อเฟสงี้เองนะ pic.twitter.com/DHBqWxAhS7
— งัวพีช (Official) (@PeacHZ_YYY) June 8, 2014
I seem to have written a chat app detector. What else fits my description?
Tv shows?
1 yıl önce bugün İŞiD Şengale saldırdı
Binlerce Ezidi katledildi binlercesi esir alındı...
http://t.co/kuIMIw8465 pic.twitter.com/ybijb5dvEV
— Nurcan Baysal (@baysal_nurcan) August 3, 2015
— خالد العوسي (@ousika) August 3, 2015
Framed quotes, inspirational or otherwise.
「村山談話」実態明らかに!
内閣を無視だまし討ちのように発表された!
談話の中には中国共産党「人民日報」
がよく使用するフレーズが使われている。
事前に中国と調整した疑いがある=和田議員
http://t.co/E6DdpoLt99 pic.twitter.com/yOl4Rah2WY
— ななみ9019 (@superneoblack) August 3, 2015
Privatisation is nothing more than shifting wealth from everybody to a privileged few.
@Corbyn4Leader will stop it. pic.twitter.com/sI0xK175Wk
— George Aylett (@GeorgeAylett) August 2, 2015
And headlines :-/
Larries: pic.twitter.com/zU3BUbTnHW
— liv (@drunkpaIs) August 4, 2015
With the above, it's easy to see how it matched, but some others are a bit more obscure. For each of the following, have a look at the tweet image for a bit before clicking through to the highlighted face.
ฟินเละเทะ ฟินพังพินาศ เอะอะลากเข้ามุม เอะอะกอด #ริวจิ #AF12 pic.twitter.com/gUjExo6Tn2
— หนุงหนิง (@niing_jaoka) August 3, 2015
It's 2 A Day Monday.Good luck to Harlandale Lady Indians Volleyball.U had a Great Camp last week. #E3P #Winning pic.twitter.com/OC3UOjuMxz
— E3Ptraining (@E3Ptraining) August 3, 2015
【ピヨ剣乱舞】ついに全ピヨそろいました!!!りつふぁぼ応援してくださった方々ありがとうございます!!!!!ピヨコはとってもかわいいです!!!!!次に誰かが実装されたらまたよろしくお願いします(画像のピヨの順番は愛用している一覧から) pic.twitter.com/0vkwJ7UbAP
— ミラで原稿しよう (@sauri_uto) July 29, 2015
You can see more of these examples and other types in my github repo.
But, did I actually find anything? It took me a about an hour of trawling through around a thousand tweets it had found, but here's a perfect example of what I wanted:
Sin lugar a dudas una de las ciudades más hermosas y mágicas ! @RestoDelMundo13 #IvanEnParís pic.twitter.com/B32iQUbFas
— Facundo Gambandé (@facugambande) August 3, 2015
A one in a thousand hit-rate is not great, but given this was barely a days worth of implementation and tweaking, it's not bad. It's also obviously improvable. For example:
This is a work in progress, and I'm quite happy with what I have so far, so might leave this for a bit. Always other projects on the go!
Finally, I'll end with one example I found which doesn't quite fit my original intent but which I like nonetheless:
Soon. pic.twitter.com/fuPfEfsRqh
— Arsenal Related (@ArsenalsRelated) August 2, 2015