When I set out to start this digital project, I had a lot of goals and ideas on where I expected to find my progress by the semester’s end. One of the biggest things I learned is how much goes into creating a project and a website; even more when it is a digital project. I had the concept thought out, but had no idea how to accomplish it. Without web design knowledge or experience, my actual concept was not possible. The building of this site taught me how vital a good project team is. Everyone has their own specialized skill when you work as a team. I know, with the right team, my concept as explained in the grant proposal would not have been such a far reach. However, this prototype is a good start.

The Blueboys at War project was first conceptualized and designed during the fall semester of 2015 at George Mason University. The basis for this project is to build a website that holds a growing history of some of the soldiers and the culture during World War I. This is done using a collection of letters sent by soldiers housed at the Khalaf Al Habtoor Archives at Illinois College in Jacksonville, Illinois. The letters serve as the main content of the website. The secondary content is all research that will be submitted to the website, using either the letters provided or outside sources, as well as the individual profiles for each soldier who has a letter presented from the collection. The ability for researchers or World War I enthusiasts to contribute to the website is what allows the project to be a growing history. Over the last three months, a prototype for Blueboys at War has been created. With this paper I will discuss the process to get the project where it is today and what steps can be taken for the future.

Goals
Going in, the main goals of this project were to give researchers and World War I enthusiast quick access to the letters as a primary source and create a space where a history about the war through the perspective of the soldiers presented on the website could be presented and updated by the community; I am calling this a growing history. In order to reach these larger goals, I had two secondary goals of providing quality digital copies and accurate transcriptions. It cannot be said at the present time if the main goal of the website being a growing history has been achieved as that requires time and active visitor participation. However, the other three goals have been completed to meet the needs of the project where it stands today.

The letters found on Blueboys at War were digitized by myself in the fall of 2012 as an undergraduate at Illinois College. The digitization of the World War I Collection was my first assignment as an archival assistant for the Khalaf Al Habtoor Archives, then known only as Illinois College Archives. Per instructions by my supervisor, all documents were carefully scanned into the computer and compiled into PDFs. My meticulous nature during this process benefitted me three years later when this project was first conceptualized and started. All of the documents were large files and easily readable. At the beginning of the project, I converted all of the PDFs into JPG form using the website pdf2jpg. This website allowed me to choose the quality of the output and for each document, an image at 300 dpi was chosen; this ensured that the documents did not lose any quality with the conversion. Preserving the quality was essential to the project. A high quality document allows for visitors to better examine the material, even without a provided transcription.

All of the documents are easy to read, but the availability of transcriptions for each document should not be overlooked. The letters are typescript copies, assumingly because the owners of the originals were not willing to part with them. The typescript makes the letters easy to read, even in documents when a letter is smudged or has something on it. However, I wanted visitors to be able easily read computer text if they preferred, especially for those individuals who use voice assistance tools for reading webpages. At the start of the project, I planned to use an OCR tool to quickly create transcriptions for every letter. I planned to use the Tesseract OCR tool since there are 117 letters contained in the collection ranging from 1-4 pages each. However, upon closer examination of the letters, I found that using an OCR tool was not in my or researchers’ best interests. In many letters, there are words censored from the letter represented as dashes. This censoring is information for researchers. In order to not risk the tool interpreting the dashes wrong, all presented letters were personally transcribed in individual text documents; making text mining possible for any interested parties.

Design/Tools
The website for this project was designed using Omeka in the Berlin Theme. This theme was chosen because the color scheme was easy on the eyes and the navigation was at the top. It was important to me to have the navigation at the top of the page because I find that side navigation can sometimes look cluttered or get lost with the text on the page. The theme also appealed to me for this project because it has a blue on white color scheme. Not only is this easy to read, it is the colors of Illinois College.

While designing within Omeka, I decided to not use the built-in exhibit feature. For this project, I did not like the way it presented the documents. I wanted each letter to have an individual custom page that could link to other letters and soldier profiles. Additionally, I wanted to provide the transcription as a link instead of text already viewable on the page. I chose to create simple pages for every letter and manually attach them to profiles through links. Approaching the pages this way also allows me to include an overview of each letter, to be added at a later date beneath “topics.”

Manually linking throughout the site, instead of providing every page in the navigation, took away a lot of clutter and allowed me to get more creative. For the soldier profiles and links to sort letters by year, I created matching hyperlinked images. I decided to approach it this way because I perceived an image to catch the visitor’s attention more than hyperlinked text. For the time being, the individual letters are kept as hyperlinked text in order to clearly separate them from the option to sort by year. In the long-run, the content in the “topics” section of each letter will be displayed as hyperlinked text as well, allowing visitors to quickly find all letters that discuss a particular topic, such as Germany. Needing to do this manually is a large down-side to not using the exhibit feature of Omeka, but the project is visually better off this way.

Content
There are three main content areas for Blueboys at War: letters, history, and profiles. The letters are the main content of the website. They are what I expect most, if not all, visitors to be accessing the site for; with the exception being those who are interested in contributing or viewing soldier profiles. All letters have been given their own page and basic information is provided: title, soldier, who the letter was written to, date on letter, link(s) to the pages, pdf copy of letter, transcript link, topics discussed or mentioned in letter. The title of each letter is the soldier of the letter along with a number, depending on how many letters that soldier has presented. The soldier’s name is hyperlinked to that soldier’s profile to make attaining more information easier on the visitor. The letter is provided in a JPG  format. The transcript opens up as plain text in a separate page (fig 1). This requires someone to create their own documents if they wish to do any text mining; I would like to look into an option to download the text document in the future.

transcriptFigure 1: A screen shot of the transcript for Clay Apple Letter 01, image cuts off right section of text.

The topics section of each letter’s page currently lists places, people, and events mentioned in each letter. At a later date these will all be hyperlinked to have any letters with common topics connected and listed together once a key word is selected.

The history section of Blueboys at War offers information on training camps and the various forts mentioned in the letters either directly or indirectly (letter is addressed from a fort). I have chosen to include a history of the Student Army Training Corps (SATC) as many of the letters come from soldiers still in training. The SATC was a government run program that used educational institutions in WWI to train soldiers while still allowing them to attend college. Not many people know about this program as it was very short lived. Since it is mentioned directly by some of the soldiers, it is an important topic to address as information about it is not easy to come by. The second section of history will provide brief information on all the forts mentioned in the letters. This will include forts that soldiers directly talk about as well as those that are only mentioned in the address line. This section will help visitors attain a better understanding of where the soldiers were writing from. This section is not yet completed, but once it is there will be a link to the forts page for each letter that has one mentioned. A history of the war has been excluded, assuming those visiting the website already have at least a basic knowledge of World War I.

The last content area for the website are the profile pages. This is the section of Blueboys at War that will be the growing history. For now, the information it will hold is the soldier’s name, any images of him, places he sent letters from (to indicate where he fought),  links to all letters written by him contained on the website, a biography of his life and service, and lastly a list of contributors to that specific page. At the bottom of each profile page is information about contributing. Any persons who wish to contribute can send an email to icblueboyswwi@gmail.com, an account created specifically for this project. Unfortunately, all of the profiles are currently empty except for a link to that soldier’s letter(s). This is due to my inability to access the Archives during this project. These pages rely heavily on community involvement from members of the Jacksonville area and any family members of the soldier(s) in order to get the profiles started.

Dissemination
This project was created with permission of the Khalaf Al Habtoor Archives of Illinois College. As such, the Archives have been updated on the project and once it is complete and up to the visual standards of the College, it can be integrated as a digital edition of the Archives. Having this project incorporated and accepted by the Archives will allow for student workers to be assigned to add to the project’s growing history aspect and have it disseminated to their followers on Twitter or Facebook. This project was created with the intent to incorporate it into the Khalaf Al Habtoor Archives making this process of dissemination best.

Let me start off by saying I’m a little torn. I think the proposal by the AHA is necessary and an option that all new PhDs should have. That is not to say I do not support open access, because I think that is a great growing method of dissemination as well. However, open access is pointless if it is not done willingly. To be forced into open access has the potential to create resentment and a tendency to shy away from it in the future. I’m not so sure Kathleen Fitzpatrick would agree with me, as she says in her book, “The production of knowledge is of course the academy’s very reason for being, and if we cling to an outdated system from the establishment and measurement of authority at the very same time that the nature of authority is shifting around us, we run the risk of becoming increasingly irrelevant to the dominant ways of knowing of contemporary culture.” She makes a point to note the outdated nature of the traditional monograph and how it is still needed, despite that. However, she seems to see the real problem being with the publishers and how they need to start moving to being a service to the universities instead of a place of business. I think that is not going to happen any time soon and is not even reasonable since she herself states how little financial support the university presses receive from the universities. Disseminating knowledge is not free, and it never will be.

I think this is one of the main issues with requiring students to make their completed dissertations open access. The university presses are not a service to the universities, they are their own entity, a business. I can see why William Cronon took the time to defend allowing the option for an embargo. While I have no way of knowing just based on his post whether his claims are all true, it is certainly enough to make you think. Are there publishers out there who would think twice about publishing my revised future dissertation if the original was open access? How many publishers out there truly do not care? Is the difference here truly between open access journals and open access books? Are students better off leaving certain evidence out like he suggests?

It’s all quite frightening for someone considering going into academia. As much as we would all love for digital work to mean as much to institutions as the traditional monograph, the vast majority are much more impressed with a published book than a published article. And if having the original dissertation online would hinder my chances for having a revised version published, I would certainly want the option to embargo it.

However, I would agree with the AHA in that an embargo would not mean completely restricting access to the work. Having the book available through inter-library loan (1 or 2 copies) seems completely reasonable. Even the option of having a digital copy available only at the institution would be viable. Would this slow the sharing of said scholar’s work? Absolutely it would. Is this a problem? I can see how it could be, even a step backwards. The best point Cronon made, though, is how this choice should be in the hands of the writer. Giving the option of open access to new PhDs promotes the method without forcing it into their careers.

I thought the perspective on the readings this week were interesting. I enjoyed reading about how the different ways to think about history can affect a classroom. I found Sam Wineburg’s  6 ways to think like a historian eye-opening. Although I have acquired and used all these skills over the years, it was not something I had thought about or even realized. At this point, it all comes as second nature to me, which I suppose it part of his point. These skills come easily to historians, but not necessarily everyone else. His following article with Daisy Martin did a great job of demonstrating the importance of teaching those skills and some of the ways it can be done. I especially liked that they made videos of people thinking aloud for the Historical Thinking Matters website. I was originally confused how listening or watching someone question a text aloud as they go would be helpful. My first thought was that it would actually confuse students, they may have no idea where the questions are coming from. However, that problem was avoided well by choosing subjects outside of someone’s focus and offering an explanation after the video on why the questions were relevant. I think the way this project was handled is a good example of how to approach this digitally. However, I think the same consideration needs to be made inside a classroom. Would you still get the desired effect if you had to follow a “thinking aloud” with an explanation?

I thought the different ways the teachers in Ways of Seeing approached pictorial pedagogy to be interesting and enlightening. I think the activity of presenting the two different portraits of the Native American to be particularly interesting. When I first looked at it, I thought about assimilation and the different ways the dominant race has depicted Native Americans for their personal agendas over the years. The addition of providing primary sources for students to look through to dig deeper into the history is fantastic. There is so much that could be uncovered and I could see this exercise easily adjusted to any other subject (with the right images, of course).

The biggest aspect I got from the reading was that an educator cannot assume their way is correct. Everyone will learn differently and it is important to first observe what is happening before any problems can be fixed.


For my lesson plan, I was not sure at first what I wanted to do or how to go about creating a lesson plan. Having edited plenty for the Education department at the RRCHNM, I know what I have is the roughest of rough drafts; only an idea, even. However, I think it does a good job depicting what I have in mind.

I decided to focus on the sourcing skill for historical thinking. In my first year of undergrad in my communications class, the professor showed us a website that was built with a bias and provided false information. Taking from that experience, I went searching for that website. The website is written by a white supremacy group about Dr. Martin Luther King Jr. If you have not already looked at it, I encourage you to do so. Just reading the links will cause any of us to raise our eyebrows. What is concerning about this website is that in a google search, this website comes up on the first page. Imagine how many people see this website for that reason alone.

Taking from that, I provided three other websites about Martin Luther King Jr. The activity involves students browsing the websites without instruction at first. My thought with this is that once they get the questions that ask them to consider reliability and author’s purpose, any students who did not notice the bias of the above site will then begin to and question the content they see across the web. However, this will only work if they actually click around and read. This is why I suggest having this exercise in a computer lab (teacher can watch the monitors and ensure that everyone has access at one time).

I think this is the most important skill to start with, perhaps that is why Wineburg mentions it first as well. The ability to source content is relevant in everything that everyone does, especially today. How often do we see friends posting ridiculous images about something that has or will happen on Facebook? The yearly “post or Facebook will start charging you” is the most common. If everyone checked sources and reliability before posting or sharing something, perhaps we could have a less ignorant online experience.

My Activity
For my historical thinking activity, I want to encourage students to use their prior background information on a topic to show the importance of sourcing. I have chosen to focus on Dr. Martin Luther King Jr., but this activity could easily be adapted to fit any subject.

For this activity, a computer lab should be reserved for the class period. This will ensure that all students will have access to a computer and the internet. It also avoids the “I forgot my laptop” excuse.

Before the class period, the teacher will compile a list of websites or pages from websites for the students to look through. They will all be on one subject, one that you can expect the students to have some background knowledge on. You will give the students access to the list of websites once everyone is logged onto the computers. Ask the students to browse the websites without instruction for thirty minutes. After the thirty minutes, hand out a list of questions to be completed by the next class period.

Questions
1] Organize the websites in order of most reliable to least reliable
2] For each website, write 2-3 sentences explaining why the website is reliable or not.
3] Who is the author of each website? Doe each website list an author?
4] What seems to be the purpose of each website? Did the authors have a certain goal in creating it?

Websites
1] First Website
2] Second Website
3] Third Website
4] Fourth Website

The hope is that this assignment will teach students the importance of looking for an author and the author’s intent when doing research. There is a lot of information online and some of it is more reliable than others. Some of it is not reliable at all. The third link is the most important in this activity. From the first page, a historian would not trust the website’s content based on a quick glance. At the beginning of the next class, show students how to find the bias in this website. You can choose to read some of “Truth About the King” with them or show how the link for “Black Invention Myths” takes the reader to a white supremacy website.

Ask students what this means for the content and the purpose of the website. Then start your lesson on the content, hopefully with your students beginning to question everything they read.

Easily the biggest concern with open access is the cost; while there is no cost to the reader to obtain articles from open access journals, there is still a significant cost that the journals must pay to disseminate the information. This was one of the most interesting topics of this week’s readings for me. It is so obvious that there are still costs for open access journals, but this is not something I had thought about at all. I first started to think about this when John Willinksy discussed PLoS Biology and how the authors or institutions supporting the authors paid $1,500 in order to have their work be open access. That number caught me off guard. $1,500 for each article! It is so expensive to offer your work for free!
However, that is due to the nature of our trade. In order for the hard work that historians do to frame an argument and offer a credible view, the readers need to see the view as credible. Rick Anderson does a good job of framing the importance of peer review when disseminating research. His article brought me back to the beginning of the semester when one of the authors discussed tenure-track and posed the question on whether or not digital work would qualify. The problem with digital work for advancing a career arises when it cannot be peer reviewed. Having research reviewed by other historians in the field lends some credibility to the topic. It tells a reader that other historians in the field have read the article and found it to be worth publishing (even for those article only published online).

The chapter by Lawrence Lessig was very helpful in helping me understand all the complications with copyright control when dealing with the Internet. The fact that copyright controls change drastically online is both concerning and fascinating. I have to wonder which form authors and publishers prefer: the traditional copyright structure for analog books, or the copyright structure for online works which provides more control to the authors and publishers. I was shocked with Lessig’s example of a book published online being able to legally be restricted to the amount of times one person can read it in a given amount of time. I understand that each time it is being read a new copy is made, but restricting the amount of times it can be read? That seems absolutely ridiculous to me. I think, perhaps, working through my thoughts as I type, this restriction would make sense with a PDF download; which is likely what Lessig was referring to now that I think about it. Each download of the PDF would create a new copy, but wouldn’t it be easy to restrict the amount of downloads from one IP Address? Under a restriction like that, why would there be any concern of it being “read” more than the allotted amount? Either way, another concern for open access online for historians is respecting the rights of the copyright holder and the copyright holder being able to maintain their rights.

 

The Wikipedia assignment was an…experience. I decided to look into the Student Army Training Corps (SATC) since that was the topic of my senior thesis and the way I used the World War I Collection previously. The SATC was a very short lived government program that had the goal of keep men in education so the country would not be left with unskilled men while simultaneously training these men for the military. I did a few searches, but this topic currently does not exist on Wikipedia. If I had more time to dedicate to secondary sources research, I would love to create this article. However, I only had one secondary source in my thesis that discussed the SATC (at least that I used); the rest of my sources were primary ones from the Illinois College Archives (ranging from letters, official documents, local newspaper clippings, and articles from the College newspaper The Rambler). Instead of trying my luck with an abundant of primary sources, I decided to edit the page on the Reserve Officers Training Corps (ROTC). The SATC and ROTC were linked, in a way, during WWI. Those students who instructors saw real promise in through the SATC were recommended for the ROTC (one of the letters in the collection I am using mentions a student being recommended for the ROTC).
Currently, my edit is still up with no corrections made.

Sorry for this being late. My internet crashed on me and Cox Communications was anything but helpful for nearly two hours.
What’s the problem? Won’t know until Tuesday! But learn from my experience, you can connect directly to your modem and still get internet, yay!
(If you knew that I envy you.)


I was really looking forward to the activities for this week! While I did not have fun with maps last week (sorry Danielle), I had high hopes for the opportunities to visualize in different ways!

Unfortunately, Java crushed all my hopes and dreams in multiple ways.

I had a hard time following some of the readings, especially Johanna Drucker, but even without completely comprehending what each author was saying 100%, they still helped me get an idea of what I can do with these programs. Just that knowledge going in made understanding the activities that much easier.

Likely inspired by our readings, I chose to create a data set of information for banned books. I thought it would be interesting to see if there were any similarities between certain authors or genres. My data ended up being short. I used the top ten contested books for the years 2010-2014 and included the following information in the data: year contested, title, author, and reasons contested. The most interesting information from this is certain books being repeated multiple years (Fig 1) and the frequency of certain reasons (Fig 2). I displayed this in Palladio with a graph for each. Something interesting I noticed, but is not represented in any graphs, is how reasons for a book to be banned would change over the years. There are some interesting trends about what were hot button issues for each year. If this was my field, that is definitely an aspect of this I would look into.
Banned Reasons by Year             Banned Titles by Year
Figure 1                                                                                                Figure 2

The next activity taught me that I didn’t have Java on my computer. Now I do. However, I ran into another problem after Java was installed. Gephi will not open on Windows 10, at least not for me (anyone else have Windows 10 and have a different experience?). Thanks to the cliowired hashtag, I gave Cytoscape a try. Unfortunately, this took a lot longer to download than Gephi did and I wasn’t able to use the wonderful steps provided by Brian Sarnacki. Technology is not working for me tonight. Cytoscape would not install because it said I did not have a proper Java version and the version I did download previously was corrupted. I tried uninstalling and then re-installing Java but no dice. I cannot get Gephi to open or Cytoscape to install.
So that’s frustrating.

Next was Voyant. For this, I decided to take ten books from Project Gutenberg in the Native American category. My first attempt of entering the URLs of the plain text in separate lines did not pan out so I created text files for each of them and then uploaded to Voyant to analyze; this worked and I immediately saw the need for help from the documentation. I had no idea what I was looking at. The first step was entering my stop words. The results were giving me too many a, an, the, that as the most common words in each text. I was surprised at how many times I was editing the stop words; there are so many more words that get in the way than I first thought!
When looking through the tools in the documentation, I was curious about the word bubble one. When I clicked “use it” and it took me back to what looked like the normal Voyant page, I understood what the url link input option was for. I discovered that Java does not work in Microsoft Edge so I switched to Firefox. Java security proceeded to block the plug-in from working so I was not able to see the frequency of words displayed in bubbles. The two tools I was able to get to work were Bubble lines and Cirrus. With Bubble lines, I was disappointed that the words I designated as “Stop words” still showed as the most frequent words. I’m not sure if there is something wrong with its reading or me, but other words I entered disappeared so I am confused, that’s for sure. Cirrus is neat, I like this one. It is very colorful and while it also includes some words I designated as stop words, it also has more than only those words. Overall, I could see this being useful for creating a visualization on the topics the soldiers in my project discuss the most.

Voyant bubblelines voyant cirrus

Lastly, a program running on Java that works for me! It’s a miracle! Mallet GUI was very easy to understand and follow thanks to the Introduction linked in our readings. There is no way I would have known what it was asking as an input and output otherwise! I really enjoyed that it created ten different topics for my different texts. It was interesting exploring each topic and seeing the frequency with which each text appeared in each topic. Some were prominent while others seemed to barely make a dent (a difference from 8923 words to only 20). It is clear that some of these topics are nearly describing an entire text with some extras in there for support. A lot of tweaking needs to be done, and possible choosing texts that have more in common that just being classified under Native American. I would like to try this again but with treaties. I think it would be interesting to see the kind of language used between the government and the tribes.

 

Here is the map I made from my data on the Letters sent from soldiers.
I took a sample of letters from 1917 and 1918 and noticed most were from training camps in the states so I focused on that aspect.

My second map shows soldiers from the same sample serving overseas.
I focused on France as twelve of the letters specified either a city in France or an undisclosed location.
There was on letter that came from Santo Domingo, but the latitude and longitude I retrieved online keeps putting the dot in the ocean. I left it on, but did not focus on it when deciding on a map.

My story map looks at letters sent by soldiers who were currently at training camps in the States

Sparse timeline of WWI

Maps are not easy, even with the help of Lincoln Mullen. Merely finding appropriate maps for my data was a challenge! When I did finally find a usable map of the United States for my first datamap, it took me awhile to figure out how to extract the already georeferenced map from David Rumsey’s Collection. I opted for using Lincoln Mullen’s guide on using Mapwarper and did it myself. I think it was better this way, even though I did realize how to extract it after my Mapwarper struggles (kept crashing on me!). I learned a lot more about geospacing than I ever could from grabbing a ready to go map. It required a lot of close examination when I was working on my second map of France. However, the original map of France was pitched as things got…wonky and literally sideways.

Getting my data sets into CartoDB made me realize how unclean the data I made last week was. I used multiple sheets, but had not grabbed all the important information into one database. Additionally, I failed to separate the different data for each map. I have a lot of blank maps on my account, but after many trials and errors, I was finally understanding Lincoln’s steps. In the future, I would like to clean the map up with different data. My pup-up info currently shows all the information in separate fields like my “clean” data has it. I would like to instead show Date, Name, and Location under one heading.

Creating the story map was a little harder for me in just visualizing what to do. I wanted to stick with using the letters throughout all the activities. This led to what I believe to be a very boring storymap. Letter and letter after letter. I at least tried to “spice it up” with the opening slide showing students of Illinois College practicing digging trenches on campus as part of the Student Army Training Corps (SATC).
As unappealing as my story is, at least to me, it did allow me to get an understanding of what that tool is capable of and everything that can be done with it. With some more time and research, I could see it being used in my project to map the life of one of the soldiers (before, during, and after the war).

Lastly, to have a little more fun with images and text, I did an assortment of World War I history for my timeline. This was a lot of fun! I was confused on how this would all turn out in the end from the spreadsheet, but I sure do love technology! It was nice and easy to bring this timeline together, I just grabbed facts from my senior thesis on the SATC; which is why it all leads to that before BAM. War over. This may be a preferable option for me in telling a soldier’s life story. I don’t like the jumping aimlessly (for my data) around a map so much.

I’d like to start by saying wow, data is hard. There is so much to think about when it comes to creating and managing a data set. I have so much to think about for the future of my project. What kind of information will I provide in a data set? What will I deem irrelevant? Should I deem any data I can think of irrelevant? I don’t think I should. Dan Cohen mentions in “Eliminating the Power Cord” how we can never anticipate exactly how someone will use our product or data. I may find the exact page or word count of each letter meaningless, but already having that data prepared could be very useful to a researcher. If I can imagine it, I will include it. At least for now.

I created a Google Spreadsheet that can be viewed here.

Originally, I misread our goal for the activity this week. I created my own data linked above from 41 out of 117 possible letters. I created a tab with identifier information that would include the file name on my computer for ease of finding. I used this to reference back to from the main set of data in the first sheet. This data has the first and last name of the writer of the letter separated as well as the date it was written and the receiver of the letter. I made sure to separate variables into their own fields as advised by Hadley Wickham. I also used the ID structure to separate some data as seen when she discusses the weekly top hits chart. I found a lot of this article hard to wrap my head around, especially once she started talking about plyr and R. All the code confused me and I had no idea what I was looking at.

I thought the talk about concatenating data by Groot was interesting. I loved the idea of being able to take from my rows and combining the data to equal a name or date (or in the case of his examples, a street name). I did find myself wondering why go through the trouble of separating the data if only to concatenate it all in the spreadsheet. However, I see now that is not for the original data set, but instead for pull from the data set. I feel silly having had that misconception now, but live and learn.

Realizing my mistake with the assignment after hours going through my own data and working on transcriptions, I attempted to find data repositories that I could use focusing on WWI. I did a Google search and found what I thought were very promising sites. Unfortunately, I could not figure out a way to extract the data I had found so that was a dead end. I looked at IPUMS-USA for their 1850 to present records. I was excited to see that they had WWI Veteran records. Usable data! And I can extract it? Perfect! I selected the WWI records and focused my samples on the 1910-1930 census. I was discouraged once I was brought to the log-in screen and saw that I needed to request an account. I feared I would have to wait hours or days for that request. Imagine my relief when I discovered it was actually a registration, not a request per-say. Then the waiting. I was told I would receive an email when my data was ready to extract. Roadblock. I was not sure how long that would be so I went on another fruitless search for usable data repositories. Reusable data is hard to find. Luckily, my data was ready in approximately five minutes.

Unfortunately, I had no idea how to use or even access it. I followed the guide given by IPUMS, but that lead me no-where. The guide uses language discussing command files and I had no idea what to do with that. Do I need to download a separate program to view the data? I think so. I tried looking up a tutorial on uploading a .dat file into Excel and found this simple guide. Which led me to have these results:

excel fail

Very interesting data it gave me on WWI Veterans. Okay, I did something wrong. What is that? I’m currently trying to figure that out. Next I explore trying to download Strata (one of the programs referenced in the IPUMS guide. I should be able to figure something out, right?

 

EDIT

Okay, I gave it another go! I found a 14 day trial for SPSS that took an hour to download (not exaggerating unfortunately) and was itself a confusing process. Once I restarted my computer like it asked I could not find the program, only a statistics analyzer. I tried opening the files from IPUMS anyway, but no luck. Hoping to ease some frustration I looked through what some others did and saw a trend of using data from National Historical GIS. I gave that a go, hoping I could also have some form of success. The first data set I downloaded seemed far too small column-wise to get any real practice in; it only have 5-7. I created a new extract and had more than enough to work with! I was not able to work with data related to my project, but I got as close as I could. I used census information from Chicago for 1920.

There were not many fields I needed to separate, really none at all. I thought about separating the area name, but that somehow did not seem appropriate. There were some columns I completely deleted as those fields were completely blank for the set of data I was working with. Although, thinking back now, perhaps I should have inserted NULL for each entry to indicate that an entry could appear there. Looks like I created a silence in my “tidy” data.

I created a separate sheet for the nationality information and age information. I found the way the data was originally organized for nationality counts overwhelming and hard to read. I decided to show all the nationalities in one column so someone can see across a row how many of each nationality was counted total or for each tract #. I also put male and female information next to each other by tract #. I thought it was foolish to have the male and female completely separated in the original.
For the age information, I did not change much. I noticed that the headers changed pattern. At first it went over 21 and then under 21 for each main heading (example: Male_over21, Male_under21, Female_over21, Female_under21). After the initial four, it separate the information by over 21 and under 21. I put all main headings next to each other for comparison. However, I feel as though this was not sufficient. I think there is a much tidier way to clean this section of the data. Since each column has over 21 and under 21 in common, it may have been wise to make that the two columns or rows. However, how would I include the identifier information for reference?

Data is still very much a difficulty for me.

View my Excell spreadsheets: Clio Tidy Data Assignment

I realized I forgot to attach a link to my new Omeka site.
For any details on why there is a new one, please see the “Omeka” page above.

Also, if there is a way to add your own thumbnail to items, that would be great because my exhibit is PDFs and looks…rather boring.
I may need to change their format from PDF to JPEG or PNG.
Now I know!

Connecting to the Homefront