Dr. Ana Jofre is an Assistant Professor in Creative Arts and Technology at SUNY Polytechnic in Utica NY. She has a PhD in Physics from the University of Toronto and an MFA in Interdisciplinary Arts Media and Design from OCAD University. Her publications and conference presentations cover a wide range of intellectual interests, from physics to critical theory, and she has exhibited her artwork internationally. Her creative and research interests include figurative sculpture, interactive new media, internet art, human-computer interaction, and data visualization.
Dr. Vincent Berardi is an Assistant Professor of Computational Psychology at Chapman University (Orange, CA) and is the director of the Computational Analysis of Health Behavior Laboratory (CAHB Lab). His works focuses on identifying trends in intensive longitudinal data, in both digital humanities studies and within health behavior interventions.
Dr. Kathleen P.J. Brennan is a Postdoctoral Research Fellow in the School of Political Science and International Studies (POLSIS) at the University of Queensland (Brisbane, Australia). She completed her PhD in Political Science at the University of Hawai’i at Mānoa in 2016 and her MSc in International Relations Theory at the London School of Economics in 2009. Her work draws on the intersections of political theory, IR, popular culture, and media studies.
Aisha Cornejo is a recent graduate of Chapman University, with a double major in psychology and philosophy.
Carl Bennett is a recent graduate of SUNY Polytechnic, with a BS in Computer Science, and is currently a software developer at General Motors.
John Harlan is a recent graduate of SUNY Polytechnic, with a BS in Interactive Media and Game Design. He is computer programmer, currently at Shiprite Software, specializing in procedural design and user interfaces.
This is the source
We describe the development of web-based software that facilitates large-scale, crowdsourced image extraction and annotation within image-heavy corpora that are of interest to the digital humanities. An application of this software is then detailed and evaluated through a case study where it was deployed within Amazon Mechanical Turk to extract and annotate faces from the archives of
This is a case study that demonstrates how the authors' software can be used to extract and annotate faces from a magazine archive.
The amount of multimedia data available is steadily increasing
Our specific interest is in identifying and labeling images of faces from the
Our methods are illustrated through a case study where the software was used to crop and label human faces from an archive of
This work was motivated by an interest in using large, image-heavy corpora, in particular periodical archives, to gain insights into cultural history. Interpreting large cultural corpora requires both quantitative methods drawn from data science and qualitative methods drawn from technology, cultural, and social studies. From this perspective, we are interested in questions concerning what the faces in a magazine archive could reveal about the larger, historical context of a publication, questions such as how gender/race/age representation have changed over time, and how these correlate with the magazine’s text and with broader cultural trends.
The archive under consideration for our case study consists of approximately 4,500 issues from
The data we collected using the crowdsourcing methods described in this paper has
been published as a dataset in the Journal of Cultural Analytics
Previous studies have successfully used crowdsourcing to achieve goals similar to
ours. For instance, when examining features of traffic intersections, the
correlation between crowdsourced results and experts was 0.86 for vehicles, 0.90
for pedestrians, and 0.49 for cyclists
While there are many other solutions for researchers seeking to perform image extraction and annotation via crowdsourcing, we believe that our software fills a unique niche for humanities researchers who want to have full control of the data collection and quality controls. Most solutions are geared towards machine learning researchers and provide these services as a bundle, where the client receives the requested clean data. These include LabelBox (https://labelbox.com/product/platform), LionBridge (https://lionbridge.ai/services/image-annotation/), Hive (https://thehive.ai/), Figure Eight (https://www.figure-eight.com/), and Appen (https://appen.com/). Such black-box solutions are not suitable for the humanities, where we must be mindful of who is doing the tagging. Our software allows the researcher to track individual workers to examine their effect on the data. Furthermore, it is platform-independent, allowing it to be deployed on any crowdsourcing site. We are aware of one other standalone image cropping and tagging software package, labelImg (https://github.com/tzutalin/labelImg), but it is not web-based, which limits its deployment.
The software package and methodology we developed are intentionally flexible,
both in the corpora they can analyze and in the crowdsourcing platform on which
they can be deployed. For the former, our motivation was to allow our tools to
be used with a variety of sources, such as the Look Magazine archive, hosted by
the Library of Congress
In preliminary work, project leaders identified the following nine facial features of interest: 1.) Gender, classified as Male, Female or Unknown; 2.) Race, classified according to current U.S. census categories as American Indian, Asian, Black, Pacific Islander, White, or Unknown; 3.) Emotion, classified according to Ekman’s six basic emotions as Anger, Disgust, Fear, Happy, Sad, or Surprise (Ekman and Friesen 1986); 4.) Racial Stereotype, classified as Yes or No; 5.) Magazine Context, classified as Advertisement, Cover, or Feature Story; 6.) Image Type, classified as Photograph versus Illustration; 7.) Image Color, classified as Color or Black & White; 8.) Multiple Faces in the Image, classified as Yes or No; and 9.) Image Quality, classified as Good, Fair, or Poor.
One issue from each of the ten decades spanned by the data (1920s-2010s) was
selected at random and analyzed by student research assistants. The student
coders proceeded through all pages in an issue (range: 50-160), identified
faces, and annotated the features according to the above categories.
Throughout this process, coders were asked to keep track of anomalous faces
that were not easily classified, a process that was extremely valuable in
refining our procedures. For example, due to the presence of animal faces
and masks, the operational definition of a classifiable face was changed to
human faces where at least one eye and clear facial features are present.
Single color images required the Image Color classification levels to be
changed to Color versus Monochrome and an
With the updated feature list established, three coders reviewed a single
issue and annotated the 185 faces that were identified by all three
individuals when reviewing the issue. To assess interrater reliability
(IRR), Cohen’s kappa (
To scale up data collection, we created a web-based form in PHP, coupled to an SQL database, that could be deployed within crowdsourcing platforms to perform the two tasks required to obtain the data of interest. In Task 1, a magazine page was presented, and participants were instructed to crop any faces that are present; in Task 2, participants were instructed to categorize the faces identified in Task 1 according to the specifications in Table 1. The data collection protocol was to first complete Task 1 (cropping) on all our selected pages before moving on to the annotation phase, which allowed cropping errors to be eliminated before sending the extracted images for annotation. Task 1 was separated from Task 2 so that crowdsource workers would only have to be trained for and perform one scope of work.
While the data-collection interface is platform-independent and can be used
to directly collect data, we found it beneficial to use AMT to recruit
participants and manage payments. Jobs
(or human interface tasks
(HITs) in AMT vernacular) were deployed in AMT as a survey link. For Task 1,
each job consisted of reviewing 50 pages and cropping all of the observed
faces within each page. AMT workers were paid $5 USD (all payment rates
cited here are in USD) for each completed job, which was based on the time
it took student coders to complete similarly-sized jobs (30-40 minutes) with
a goal of paying between $8-$10/hour, above U.S. federal minimum wage
In this task, workers were presented with a job consisting of 50 images, 47 of which were randomly-selected magazine pages and three of which were validation pages. On each assigned page, AMT workers were asked to crop a rectangle around individual faces by clicking and dragging from one corner of a rectangle to the opposite corner. (See Figure 1). If there was more than one face on the page, workers selected an option to remain on the page and continue cropping. Once all the faces were cropped, or if there were no faces on the page, workers selected an option to move onto the next page in their job. We observed that workers often abandoned an assigned job after the first few pages, resulting incomplete jobs within our system. To eliminate these jobs, a script was created that ran in the background to look for pages that had been assigned within a job that had been inactive (i.e. no faces cropped) for more than 2 hours. Any data collected from these jobs was deleted and the pages within them were made available for a new job assignment.
Within each job, 3 of the 50 pages that the workers analyzed were validation
pages, whose inclusion was meant to help detect workers that attempted to
quickly receive payment by repeatedly indicating that there were no faces on
each page, regardless of content. These pages were selected randomly from a
database which contains a list of magazine pages and the known number of
faces on each page, as determined by trained project personnel. These are
our ground-truth
faces. Worker quality was assessed by comparing the
number of cropped faces on these pages to the known number of faces.
Workers’ validation page was flagged if they cropped more than one face on a
validation page with only 1 face or cropped
To facilitate the further inspection of AMT workers with a high number of flags, an easy-to-use, in-house review interface was built (Figure 2). On a single webpage, this interface displayed all of the magazine pages assigned to any worker, along with frames around the image areas that the worker selected for cropping. Using this interface, project personnel were able to rapidly scroll through the pages, inspect the work, and make note of pages with mistaken crops or faces left uncropped. If a worker had errors on more than half of their pages, then payment was not provided and all pages in their job were re-analyzed. We paid all other workers but used our revision process to identify pages with egregious errors, which were returned to the pool to have their analysis redone.
In this task, workers were presented with a job consisting of either 25 or 50
images of faces, and were asked to enter appropriate tags for each face. The
faces were randomly selected from the images that were cropped in Task 1.
Procedures similar to those outlined in Task 1 were used to simultaneously
manage multiple jobs, ensure that a sufficient number of images are
available to populate each job, and cancel jobs that have timed out. For
each face in a job, workers classified facial features according to the
categories in Table 1 with an additional not a
face
option that served as a quality check for the collection of
cropped faces. To maximize task efficiency, the options for each
classification were presented as clickable radio buttons, rather than as
drop-down menus. As in Task 1, once the job was completed, the workers were
given a randomly-generated completion code that was used to secure payment
through the AMT platform.
In a similar process to Task 1, each job contained 3 validation faces, also known as ground-truth faces, each of which was consistently labeled the same by three student coders over all categories. To create a flagging system, we focused on the three categories that had the highest rates of agreement in our preliminary data collection: gender, image color, and image type. Magazine context had the second-highest interrater reliability, but as will be discussed in section 3.4.1, our software was configured to assess this feature in two different ways so it could not be used for validation. When the classifications matched the known values for a given validation image, the flag value was set to zero. Each mismatch contributed a value of 1 to the flag, with a maximum of 3. Images with large flag values were subject to further scrutiny. For the cases where an AMT worker had mismatches with the validation images, it was not possible to build a succinct visual inspection tool for all images as was done in Task 1 since category selections cannot easily be represented visually. Furthermore, there is a degree of subjectivity and ambiguity in certain categories, such as the presence of a smile, so we chose not to develop explicit criteria for processing AMT payments and all workers were paid. To navigate the potential for erroneous data and/or ambiguous categories, we obtained multiple annotations for each face, which were aggregated to obtain a crowdsourced label. As will be described in a subsequent section, we had each face annotated twice and resolved inconsistencies by choosing the label associated with the worker who was most consistent with other workers over all annotated faces, and who had the lowest number of flags.
We also took this opportunity to examine how variations in the interface affected annotation results (see Figure 3). In particular, we were curious about whether faces taken out of context were more likely to be erroneously labeled. For example, a closely cropped face may not include gender cues, such as hair and clothing. To address this question, we developed two different annotation interfaces. In the context-free version, we show only a cropped face to workers, who then determine the characteristics. Because there is no context around the face, the magazine context (ad, feature, cover, etc.) and multi-face (whether the face being tagged is accompanied by other faces) categories were required to be determined in Task 1 while workers did the cropping. In the second (default) version of the task, workers see the full page with a rectangle around the face of interest when labeling the face and workers answer questions about the face as well as about the context around it. We default to this later version of the interface since we were able to automate Task 1 (see section 5.2), requiring the context annotations to be assigned in Task 2. We found that, despite there being only two additional questions in the default version of the interface compared to the context-free version, it took almost twice as long to complete the labeling tasks, which is why AMT jobs consisted of 25 rather than 50 faces with the default version.
A case study was performed using a subset of our magazine archive consisting of one July issue selected from every year between 1961 and 1991, which corresponds with our historic period of interest. Additionally, each of the one-per-decade issues that the student coders manually labeled during our preliminary studies were used as a second data set. The first data set was denoted as 30YR (it spans 30 years) while the second was called OPD (as we selected One issue Per Decade). After being cropped, both the 30YR and OPD data were each labeled by two distinct AMT workers.
A total of 87 AMT workers cropped 3,722 total pages in Task 1. Due to various glitches that were discovered during deployment and eventually rectified, certain jobs contained less than 50 pages with the average being 47.18 pages per job. The average time to complete a job was 47 minutes. Three validation pages were randomly included within each job to address concerns about individuals incorrectly indicating there were no faces on a given page. However, this behavior was not widely observed, as less than 5% of all validation pages were characterized as having no faces. More common errors appear to have been cropping only a fraction of the faces present on a given page or including many faces within a single crop. For example, 20.0% of validation pages with 3 or more ground truth faces were characterized as having only 1 face. The cropping error rate was significantly reduced when workers were required to acknowledge that they read our instructions before beginning the job. Overall, for 72.8% of validation pages, the number of faces identified by the AMT workers agreed with known number of faces. For an additional 7.6% of validation pages, AMT workers cropped more faces than the known number. It is likely that these cases represent genuine attempts at completing the task, where the known faces along with additional small, poor quality faces were cropped. Processes were implemented to eliminate poor quality faces (see section 4.3). Therefore, the cropping accuracy should consider true positives to be those validation pages where the number of cropped faces either matched or exceeded the ground truth, which led to an effective accuracy of 80.4%. Each page was verified with our inspection interface described above and crop errors were corrected before proceeding to Task 2.
In Task 2, a total of 342 workers annotated 9,369 faces. One AMT assignment consisted of either 25 or 50 faces, depending on whether the default or context-free interface was being used. Technical glitches, which were later corrected, occasionally caused the number of faces in a job to slightly vary. The average time to complete a job was 30 minutes using the context-free interface, with a job consisting of 50 faces, and 25 minutes using the default interface with a job consisting of 25 faces. Table 2 illustrates the consistency of image annotations with the known labels of the validation images. With the exception of image quality, the accuracy for each category was above 87%.
As described in section 3.4.1, Task 2 was deployed with two different interfaces. In the default case, faces were presented in the context of the original page they were cropped from, while in the context-free case, the face alone was presented. To investigate whether the interfaces affected the labeling task, we used the default interface for both rounds of OPD labeling, but varied the interface for the 30YR data, as shown in Table 3. We then examined the consistency of labels over these two cases.
For each of the 10 labeled features, the proportion of images where the
ratings agreed was calculated for both the 30YR and OPD data sets. The
results are illustrated in Table 4. According to
Interestingly, the correspondence in magazine context was larger across
different interfaces in the 30YR data than across the consistent interfaces
in the OPD data. The observed statistically significant differences may be
due to the large sample size, which is bolstered by effect sizes (Cohen’s
f) that are well below 0.1 in every case;
typically, a moderate effect is considered 0.3. As a result, we conclude
that the differences in annotation quality according to the interface design
are relatively small.
We next explored the effect of image quality on the consistency between
raters. Each image was classified as having Satisfactory
Quality (SQ) if both raters scored its quality as either good or
fair, or Non-Satisfactory Quality (NSQ) otherwise.
Approximately 27% of the observations were classified as NSQ. The proportion
of matches for each feature was then calculated separately for both the SQ
and NSQ cases. The results are illustrated in Table 5. For 6 of the 10
features, f) were larger than when comparing 30YR to OPD images
with the adult and image quality features approaching a moderate effect.
The results in Table 5 indicate that it may be advantageous to eliminate NSQ
data from subsequent analyses. Before doing so, it is important to determine
if this will introduce a bias. Due to changes in printing technology and
subject matter over the 90+ years spanned by the data, there is the
potential for image quality to differ by time. This possibility was assessed
by separately calculating the frequency of SQ and NSQ images in each issue.
A
Each face was annotated twice, each time by distinct AMT workers. While the
majority of labels (~ 80%) were in agreement, we required a methodology to
resolve disagreements between labels in order to have a definitive value for
each annotation. When crowdsourcing data, this is often achieved by having
multiple individuals rate a given image and then using a majority rules
approach for each feature
Table 6 compares our two methods of calculating the proficiency score with the flagging system for image annotations. The sum of the flags for each participant was calculated and proficiency scores were stratified by these values. As shown in Table 6, lower proficiency scores were associated with larger flag values, which indicates that our flagging system provides a reasonably good indicator of worker proficiency. An ANOVA test indicated that the differences in proficiency score values among the flag values were significant (p<0.001) for both varieties of the proficiency score.
Prior to deploying the proficiency score methodology to resolve annotation inconsistencies throughout the entire corpus, it was necessary to determine the consistency of this methodology with the more established majority-rules procedure. To assess this, a subset of 1,000 SQ images were selected from the corpus at random and then submitted to AMT for three additional annotations (i.e., five total annotations). The annotation label selected most frequently was selected for this image with ties between annotation labels (< 1% of all annotations) chosen at random. Table 7 summarizes the proportion of faces for which the annotation labels in the five-rater consensus and proficiency score (using the all rated images option) matched. These results indicate that the proficiency scoring procedure is sufficiently accurate to allow future iterations of this system to proceed with only two raters per image, which will allow for a more resource-efficient project.
While this software was built for our specific purpose of cropping and annotating faces from
As a demonstration of this flexibility, we hosted a proof-of-concept workshop
in December 2018 demonstrating the use of our tool on selected pages from
the GQ Magazine corpus
The cropping part of the software (Task 1) is particularly easy to adapt for cropping other objects. In our own research, we are currently using the cropping part of the software to extract the advertisements from the corpus. The software is also being used to identify measures of neighborhood distress (graffiti, abandoned vehicles, etc.) in a study that examines the role of environmental factors in promoting physical activity.
Our case-study data has provided us with a corpus-specific training set that
we have used to train a RetinaNet detector
We have also trained classifiers to automatically label the gender of the
face by fine-tuning a pre-trained VGG Face CNN Descriptor network
We created an additional piece of software, also available on our Github page (https://github.com/Culture-Analytics-Research-Group/Metadata-Analysis ), that pulls the data directly from the database where the crowdsourced annotations are stored and creates visual summaries of image annotations versus time. The user can select any annotation category and easily generate a chart of the selection as a function of time, aggregated by year or by month. In addition, the tool allows users to select subsets of categories. The example in Figure 4 shows the percentage of women’s faces out of the subset of faces identified in the context of advertisements. This tool is intended for preliminary analysis that allows researchers to quickly identify temporal trends and patterns.
The data we collected with these methods have allowed us to generate more
data via machine learning, and has allowed us to ask the following questions
In our own work, we used the data collected through this method (as well as the automatically-extracted data that this work made possible) to examine how the percentage of female faces found in
We were successful in building and deploying software to manage the crowdsourced extraction and labeling of features from an image-heavy corpus. While the software is generalizable, we focused on an application where faces were required to be extracted and labeled from
Our case-study results show that the differences between labeling performed
on context-free versus context-rich interfaces were small. However, there
was a notable difference when we instead compared images that were tagged as
less tedious
than the context-free interface:
viewing pages from vintage magazines was more
entertaining
than viewing decontextualized images of faces. In
the end, we likely will opt for the default interface in our future studies.
This is in part because we have been able to fully automate image
extraction, but also because the context-rich environment seems to increase
the readability of the selected face. An image of a face alone loses the
rich contextual information of the complete page in which it appeared.
Using the methods described in this case study, we successfully collected
data that was 1) used to train an object detector and an image classifier,
2) published and made accessible to other digital humanities researchers
While AMT offers multiple options, including developer tools and a sandbox,
for creating image cropping and tagging interfaces, we chose to build our
own web-based application for several reasons. For one, this allows complete
customizability, which was beneficial as we tweaked our approach in response
to preliminary data. Also, this web-form enables us to collect data in a
manner that is independent of any service providers, which allows us to use
different services without compromising our methods. In this work, we used
AMT to provide a proof-of-principle, but we plan to deploy this system on
other crowdsourcing platforms. The stand-alone interface also opens the
possibility of collecting data with volunteer crowdsourcing, as has been
done in projects from the New York City Public Library
From a humanistic perspective, there is a limitation in using only visual data to classify race and gender. In the case of gender, our data doesn’t distinguish between someone who identifies as a woman (or man) and someone who presents as female (or male), and the automatic classification trained on this data assumes that gender is binary, which is problematic. Human coders, who see the context of the page can mitigate this problem by labeling the gender as ‘unknown’, which accounted for 6% of the faces. However, upon closer inspection, we found that none of these were actually gender non-binary adult individuals: many were not faces at all (errors in the face extraction), many were very small low resolution images that were hard to read, some were non-gendered cartoon illustrations (a face drawn onto an object, for example), and some were infants or small children. So, while problematic, the assumption of a binary gender may be suitable for examining certain mainstream 20th century publications such as
A second, more practical, limitation is that this software requires that the user have some familiarity with PHP and with managing SQL databases. Our goal was to make a useful tool for researchers, rather than a polished commercial product. Researchers using this software need to have someone on their team with basic programming experience. The tradeoff, however, is that this software allows researchers to have full control of the data collection and quality controls.
Our next steps are to continue using this crowdsourced data we collected to automate the classification of other categories, and to undertake a close examination of the context in which faces appear, particularly advertisements. To this end, we are using our software to crowdsource the extraction of all advertisements from selected issues of the corpus. These will be used to train an algorithm that will extract all the advertisements from the corpus. Using this advertising data in conjunction with our face data will allow us to undertake a study on trends in advertising in this particular media outlet.
The ultimate goal of this project is to create web-based interactive visualizations of the data we extract from our
In addition to gaining insights from our corpus and making these publicly accessible, we also aim to develop novel methodologies for the visual analytics of large, image-based data sets that can be applied to a variety of projects and shared with other researchers.
We would like to acknowledge Michael Reale for his help with automating image extraction and tagging. We would also like to acknowledge generous research support from our institutions, SUNY Polytechnic and Chapman University, for the start-up funding that made this research possible. Finally, we acknowledge IPAM at UCLA for bringing this collaboration together at the Culture Analytics Long Program and for equipping us with the tools to undertake this research.
This is a web interface for gathering data from images on a large scale. Users should serve it with accompanying writable SQL databases. We provide the accompanying database structures here and on Github, along with the code.
This web-based interface facilitates gathering data from images: it allows users to crop a selection from a larger image and to input information about the crop. In our case, we are selecting faces out of images from a magazine archive, but with some minor edits this code can be used to select anything else from an image archive (cars, trains, signs, etc.).
This web interface is platform independent. Users only need a link to access it.
The code itself has three different data gathering surveys that are part of it.
The first survey allows participants to select and save a cropped portion of
an image. The survey contains multiple pages (in our case 50), and the
participant has to select and submit all the faces from each page. To access
the cropping survey use the link survey.php?load=crop
.
For the crop, we used https://github.com/odyniec/imgareaselect
imgareaselect
by Michal Wojciechowski.
The second survey allows users to classify the already cropped images from a
selection of categories. To access the cropping survey use the link
survey.php?load=tag
.
The third survey is simply a demographics survey that allows users to enter their demographic information, and is presented at the end of each of the previous two surveys.
The code of this survey is split into 4 different files
If the job is to crop images, the url survey.php?load=crop
should be
used. The image to be cropped is presented and users are asked if the object
to be cropped is present (faces in the case of the original purpose) in the
image. If the object is present users can crop it be clicking and dragging
over the object in the image. If multiple objects are present users may
select that there are more objects (faces) on the page. Any previous cropped
objects will be covered when cropping another object. If it is not present
users may simply select that the object is not there and move to the next
image.
If the job is classifying images that were previously cropped, the url
survey.php?load=tag
should be used. The user is presented the
image from which an object of interest was cropped, with the cropped portion
highlighted along with questions about the classification of the object.
Each job within the survey has a total number of images to be done at one time that can be set along with three check points that can be set (in
$job — php $_GET variable that indicates whether the job I for cropping
or tagging so that the proper page is loaded. Obtained from the url, for
example, in the url survey.php?load=crop
$job=crop.
$batch_size — variable controlling the number of images per job
$check — array variable that contains when ground truth images will be shown in the job
$face_total — variable for cropping that keeps track of the number of objects cropped from a specific image
$file_array — holds image file names to have a group number added at the end of each job
$check_data1, $check_data2, $check_data3 — holds data submitted by users on each of the three ground truth images
db_connect() — returns a mysqli_connection object for connecting to the database, set $servername, $username, $password, and $database you wish to connect to
select($job, $batch_size, $connection) — selects images one at a time as long as there is enough images available for another job, otherwise users are presented with a message that requests are currently at capacity. This function also marks pages as being worked on in the database and adds a timestamp for clearing data on a job that was never finished. The file name of the image is returned
check_select($job, $connection) — similar to select, except it selects ground truth images from their tables.
parse_filename($job, $filename) — parses information from the file name of the image. If the job is cropping, then this information is used to create the path that cropped images will be stored in. If the job is classifying, then this information is used to determine the path of the original image. The parsed data is stored in the $file_data array to later be displayed and submitted to the database. This function is based on the file name scheme of the images originally used with this code.
display($job, $file_data) — handles what is displayed for the user depending what the job is. Inputs for the survey questions are printed out as radio buttons
hidden($job, $batch_current, $filename, $file_data, $file_array, $check_data1, $check_data2, $check_data3) — prints out the hidden inputs for each job mainly the data parsed from the filename. If the job is cropping the hidden inputs containing information for cropping the data is printed out.
post_hidden() — prints out hidden inputs for
crop_image() — handles the cropping of images for the crop job and accounts for offset of different window resolutions and sizes.
post_variables($job) — sets the variables in post that will be submitted to the database for each job along with variables needed for post functions
submit($job, $connection) — submits data to the database for each job and marks images as no longer being worked on. If the job is cropping and no object was cropped then no data is submitted. If the job was cropping and the page was a ground truth page a temporary entry is mad in a table so that covering previously cropped objects on pages with multiple objects will work properly.
final_submit($job, $connection) — submits the demographics information to the database. A group number is generated by selecting the highest group number from the database group tables for each job and adding one.
This group number is assigned to each image that was part of the job. It is also inserted into the check table for each job along with possible flags raised from the information in the check arrays and a randomly generated code that will be presented to the user. This code is for admins to manage payment via Amazon Mechanical Turk.
demographic($job, $file_array, $check_data1, $check_data2, $check_data3)- displays the form and the inputs for users to enter their demographic information
coverfaces($job, $connection, $filename, $file_data) —
If the job is set to crop
, covers previously cropped objects
(faces) on images where multiple objects need to be cropped, by
selecting previously submitted x and y coordinates from the database. If
the image is a ground truth image then it selects from the temporary
entry in the table for crop checks. If the job is set to tag
,
this function is used to find the coordinates and draw the rectangle
around the object to be classified.
Below is the pages
table structure — Used for the cropping task.
Below is the crop_groups
table structure — Used to track workers in
cropping task.
Below is the ground_truth_crops
table structure. This is the ground
truth table that is used for the cropping task.
Below is the tag_groups
table structure – Used to track workers in
tagging task.
Below is the data
table structure — this is the table that contains
the collected data. Year, month, day, page, image, and coordinates are
populated during the cropping task. The rest of the columns are populated in
the tagging task.
The ground_truth
table has the same structure as the data table — This
is the ground truth table for the tagging task.
The crop_check
table stores the year, month, day, page, and
coordinates of the ground truth pages that the user crops. This keeps track
of the objects cropped out of the ground truth
pages. It is used to
cover objects that a user has already cropped from a single page when
multiple objects are present, and it is used to calculate the flags in the
crop_groups
table. Once the job is finished and the flags are
calculated, the entries in this table are deleted.
tag_check
table structure (this table records workers’ entries on the
validation pages)
While this software was built for our specific purpose of cropping and annotating faces from a specific periodical archive, we were mindful about its generalizability and developed it with the hope that it could serve as a useful tool for other researchers. We share our code and database structure on GitHub with this intent. The code is written so that the cropping job is easily generalized and the annotation variables are easy to modify.
The most straightforward application of this software is for researchers interested in cropping and annotating objects from other magazine archives. To use our application, the archive needs to be stored as a collection of .jpg images named using the following convention: YYYY-MM-DD page X.jpg (where YYYY is the year, MM is the month, DD is the day, X is the page number). We share the database structure so that users can easily configure it from their server. Users can change column names (and corresponding variable names in the code) as needed.
The key part of the code consists of four php files:
To use the cropping task, users should list the images they want analyzed in the
When a worker crops a face with this interface, a copy of the cropped image is stored on the backend and the
To display the annotation task, users should serve the
survey.php?load=tag
URL. (a demo page can be viewed here: https://magazineproject.org/TIMEvault/survey.php?load=tag .) To
use the tagging task, the
If users want to annotate features that are different from the ones we listed, the names of the data columns can be changed, as well as the corresponding variable names in the functions