Flickr Strips Copyright Metadata

Flickr has a wee problem. While it loves the metadata that is in your photographs when you upload them, merrily adding the camera information to its database, adding keywords as tags, and the like, it then turns around and does something absolutely horrible. It proceeds to strip the most of the metadata out of every resized image it makes from your originally uploaded photograph, including the all important copyright and other IPTC metadata that describe who created the photograph and potentially carry along a caption, title, and keywords for the image.

Boiler Bay

This means that the 500 pixel on the longest size images that you see on Flickr as you browse a somebody's photo stream have no identity built into them as to who made them. Nor do any other size besides the original uploaded file. Sure, when you're viewing the image on the Flickr page, you don't need that information. But when a user saves that image out to their local disk, or embeds a that photo in their own website even if it's being served out of Flickr's farm, there's no way to link that photo back to the maker.

The problem gets worse as the use of the image spreads. For example, if you upload an image and then it gets uploaded to Wikipedia and then it makes its way into a news story on CNET or appears on a blog somewhere, the ability to find out who took that photo gets harder and harder with each successive step and often involves asking humans "Where'd you get that image?" With the number of images we all see in a day, you can see that this particular approach doesn't scale.

Furthermore, with data going everywhichway on the Internet today, not carrying along IPTC information is a recipe for creating orphan works. An orphan work is defined as a work for which a the copyright owner may be impossible to identify and locate. Orphan works have become such a big problem that the U.S. Copyright Office has prepared a report on orphan works. According to the Center for the Study of the Public Domain at Duke, orphan works probably comprise the majority of the record of 20th century culture.

By serving as one of the huge repositories of online imagery, and by stripping metadata on almost every image it serves, Flickr is unintentionally pouring fuel onto the fire of the orphan works problem. Sure, creator and copyright metadata take up a wee bit of space in the file format, but in today's Internet environment, the relatively few bytes it takes to preserve this information is negligible, especially compared to the value of being able to pass along this all too critical information.

Try it for yourself. Grab two versions of an image from Flickr, such as the image above. Grab the originally uploaded file and then grab the other any of the resized images from that file. Take a look in any tool that lets you examine a file's metadata. Here's an example from one of my images. The info box on the left is from the original size file, the info box on the left is from a resized version, specifically the image embedded above.

flickr-meta-original.png flickr-meta-resized.png

As you can see, the panel on the left has all sorts of information about the photo, it's caption, title, and the creator information. If somebody finds this image and wants to find me, they can type my name into any search engine and they'll find my contact info very quickly. On the other hand, the image that the panel on the right belongs to—the resized image—doesn't have any IPTC data embedded into it at all. Imagine you got ahold of this image from somewhere out there on the net—maybe a Facebook page where somebody had included it to show to a friend—and you wanted to contact me. Could you?

What should Flickr do to fix this problem? There are two steps, one of which should be relatively easy, the other which is hard.

  1. Flickr needs to change its resize engine to preserve IPTC metadata in uploaded files. This should be straightforward.
  2. Flickr needs to reprocess its collection of images so that IPTC metadata in original files is propagated back to all images on Flickr. This is the tough one since there is so much data in Flickr at this point.

Making sure that creator, copyright and other IPTC information, including titles and captions, is preserved in every image sourced from Flickr—especially the 500px default sizes—will be a huge benefit to all of us as the years go on and the images that people source from Flickr are used in more places than we can imagine. Otherwise, all of those images that are being served from Flickr right now will eventually become orphans, causing problems for both creators and consumers of that work.

I dig Flickr a lot, and Flickr has served me very well over the years, including just over the last few days when I asked readers of this blog to mark photos for possibly inclusion into a catalog of images to be sold as prints. The stripping of metadata, however, is a big problem for me. I've chimed in on a feature request at Flickr for preserving IPTC metadata. If you think that preserving IPTC data in photographs is important, and you're a Flickr user, please consider joining in and making your voice heard.

Related Posts:

Related Links:

This is one of 187 blog posts on duncandavidson.com. If you care to read more, two posts I recommend are Dear Speakers, a set of thoughts for public speakers that I pulled together in March, 2009 and Tilting at the Windmill, One Last Time, a call to Flickr to include important EXIF and ITPC metadata in the photographs they provide to the public.

15 Comments

Wow - that's an incredible oversight/problem.

user-pic

I'm going to point out the obvious problem here. 2 billion photos times 5 resized views of every image times (say) 1k of metadata for titles is just shy of 10TB of extra storage that has to come from _somewhere_. Which might be an overestimate, of course - not everyone _has_ metadata on their photos, but nowadays I'd expect most photos to have at least whatever their cameras added.

More worryingly, there are plenty of photos in the wild with embedded colour profiles that are several times larger in bytes than the photo itself. Hanging onto those for the small thumbnails rapidly becomes silly - I've seen 2 megabyte 64px square thumbnails served from twitter, for instance. So you have to start picking and choosing which bits of metadata you're going to keep. And that'll just lead to _more_ arguments.

You're still right, of course. The underlying problem of orphaned data is too huge to dismiss on perfectly solvable technical grounds. But it's worth thinking about those technical grounds while considering the solution.

user-pic

This would be a nice feature. Do any of the online services preserve metadata in resized images? I'm pretty sure SmugMug only keeps in the the original.

user-pic

You are obviously right; alas, it seems that most services that provide an outlet for photographers (professional or amateur) are not concerned about protecting their users' rights (or preserving the relevant photo metadata), especially if it means additional programming and processing resources that would need to be committed to this).

A very recent Adobe Photoshop Express TOS fiasco (see my longer comment ) is a good example of companies not only ignoring photographers' rights, but indirectly (even if perhaps unintentionally) encouraging the abuses you described in your recent "Copyright Conspiracy" post. I guess the only way to try and change it is to talk and blog about it, until the importance of those issues "registers" among more people, including the casual users who are the majority of people using services like flickr, Picassa, or Adobe's new PS-Express site.

user-pic

Jerakeen: With Flickr now taking in, hosting, and serving video--I'm not really thinking the extra n'th percentage of storage space that keeping full metadata around will require is going to make any kind of economic impact. Zero. Zippo. Nada. As far as what kind of metadata to keep, sure, you can make useful arguments about big ICC profiles--and I'm happy to convert thumbs and micro images to sRGB and let them be if that's what it takes--but IPTC metadata is pretty darn insignificant, and IPTC Creator and Copyright at a minimum should be a no brainer.

Ben: I'm not sure. I've not tried out some of the others to see. Good question.

George: indeed, it's another one of those "Community Content" dark sides. It's all good when it looks good for the aggregator, but if there's something that needs to be done that's not in the aggregator's interest, it takes more oomph for things to happen. Talking about it is one way. This is why I encourage everyone to go hit up that Flickr page and make their voice heard.

user-pic

Yah. 12 hours after I worry about harddisk space, video turns up. Now I look _really_ foolish.

user-pic

Wonder what Microsoft's lawyers think about this? IMHO, they'll probably make Flickr add the IPTC metadata to every resized image.

user-pic

Ben: I just checked out the resized info in Zenfolio images and they do preserve Copyright and other EXIF/IPTC information in their resized images. You have to try pretty hard to grab an image thanks to some of their anti-copying behavior, but even once you do snag the image, it's got everything there.

user-pic

Failures such as these are why the costs of technology still outweigh its benefits in so many cases. Weren't we all promised the world of the future? But now that it's here we spend what seems like five-fold the time and energy to maintain it. This is the perfect example: Flickr is an amazing pool of imagery (expecially Creative Commons-licensed photos), but in order to adhere to the terms of the licensing agreements to use the photos we must spend more time manually recording URLs of access and the creators of each image. No wonder sites like iStockPhoto with their rock-bottom pricing on RF images are so popular.

user-pic

And now we have S.2913 and H.R. 5439 which would essentially put any photo on flickr into the public domain. Copyright conspiracy indeed.

user-pic

JDD: On the flickr thread you mention this:

On the no-regeneration on a preference change, I get that it's something that shouldn't happen on a preference flip, but I have 6000 images that are in Flickr that need copyright/creator info. If you enable this for new images, but there's no way to go back and mark the old photos, I've still got the same problem for a vast library of interesting, useful, and valuable photos--including every O'Reilly conference shot for the last 3 and a half years. Would I really have to re-upload all of those and break links? Or replace each image by hand? That'd be a problem as well. I'd just a soon kill my account than try try to do that by hand.

While I agree 110% with your issue with flickr [and i am very happy you brought it up some months ago, as i had no idea, and since then i have stopped posting anything other than what i consider snapshots], I think there is a way around having to manually replace all the photos. When you're in Aperture, and assuming you're using FlickrExport, there is a feature that allows you to re-upload using flickr's "replace" feature.

So for me, I would simply load up my smart albums that have images I have uploaded to flickr and re-upload them using the 'replace' checkbox. No tags/titles/etc need to be edited; you just need to select all, and patiently wait for all your images to finish being uploaded.

This assumes, of course, that flickr will actually do something about this. I do not know if they have, so I need to check a few of my recent uploads and see if that is true or not.

user-pic

Indeed, FlickrExport out of Aperture would work wonders for replacing photos, as long as you had uploaded all of your flickr photos with it. For those photos uploaded outside of FlickrExport, I don’t think there would be an automated solution.

I don’t believe that Flickr has made any motion towards actually fixing the metadata problem as well. I’m steadily loosing hope.

user-pic

"I guess the only way to try and change it is to talk and blog about it", you're kidding right?

Since copyrights are outlined in the Constitution any violation of any one of the creator's exclusive rights is a federal offense and is the jurisdiction of the Federal Bureau of Investigations. I'll bet they can get this straightened out one helluva lot faster than talking and blogging ever could.

http://www.fbi.gov/pressrel/pressrel02/outreach071702.htm

Phone number to file a report is at the bottom. You can also contact your local FBI unit 411 for the number.

user-pic

Oops, please disregard my previous comment. I just read Flickr (yahoo) TOS and found this:

However, with respect to Content you submit or make available for inclusion on publicly accessible areas of the Service, you grant Yahoo! the following worldwide, royalty-free and non-exclusive license(s), as applicable:

a. . . . with respect to the Content (text) . . . [it includes] the license to use, distribute, reproduce, modify, adapt, publicly perform and publicly display

b. . . . with respect to photos, graphics, audio or video . . . [this includes] the license to use, distribute, reproduce, modify, adapt, publicly perform and publicly display

c. With respect to Content other than photos, graphics, audio or video you submit or make available for inclusion on publicly accessible areas of the Service other than Yahoo! Groups, the perpetual, irrevocable and fully sublicensable license to use, distribute, reproduce, modify, adapt, publish, translate, publicly perform and publicly display such Content (in whole or in part) and to incorporate such Content into other works in any format or medium now known or later developed.

The only good news is that this license exists only for as long as you elect to continue to include such content on the service. So the easiest way for you to reclaim your copyrights is to use your own host and post your own licensing terms.

user-pic

Betterphoto.com appears to be doing this, also, at least on my images. Zenfolio does not do this and in fact they display your copyright metadata with your image. I think that a) we all need to write our congressional reps to oppose the Orphan Works legislation in its current, overly broad form and b) at the same time request that the deliberate removal of copyright info be made a crime.

user-pic

Leave a comment