3. What are linked lists?
Linked lists are data structures where nodes point to
the next nodes or end of the list.
http://en.wikipedia.org/wiki/Linked_list
4. What are types of linked lists
Singly: linked lists contain nodes which have a data field as well as a next field, which points to
the next node in the linked list
Doubly: is a linked list where each node contains, besides the next-node link, a second link
field pointing to the previous node in the sequence. The two links may be called forward(s)
and backwards, or next and prev(ious).
Multiply: is a linked list where each node contains two or more link fields, each field being
used to connect the same set of data records in a different order.
5. How are linked lists used?
They can be used to implement several other common
abstract data types, including stacks, queues, associative
arrays, databases, file systems, and symbolic expressions.
Example: MongoDB storage engine
http://blog.fiesta.cc/post/13975691790/mongosv-live-blog-
mongodbs-storage-engine-bit-by-bit
6. Linked List use example: Stack
C++
typedef struct stackNode
{
int data;
struct stackNode *nxtptr;
} StackNode_t;
http://stackoverflow.com/questions/5552394/typedef-and-linked-list
7. Why am I talking about Linked Lists?
http://funfax.tumblr.com/
8. Earlier this week Hard released the MARC21 library
metadata.
http://openmetadata.lib.harvard.edu/
9. This is a big deal.
Bibliographic MARC data licensing is very expensive.
Marcive charges on the low end at $1,400/year for academic titles.
Amazon like datasets can reach 5 figures por year.
http://home.marcive.com/index.php?option=com_content&view=article&id=52&Itemid=31
10. What is MARC21?
MAchine Readable Catalog (MARC) is a metadata
transmission standard based on ANSI Z39.2 (now ISO 2709)
standard .
http://en.wikipedia.org/wiki/MARC_standards
http://www.loc.gov/marc/
11. This is the MARC21 field list:
http://www.loc.gov/marc/bibliographic/ecbdlist.html
This is one record:
http://caffed.net/record.txt
http://hipsterorjesus.com/
12. The Harvard dataset is 10GB in MARC21
format separated
into 14 files.
Linked Lists are not random access data formats.
To utilize the data it must be converted to something else.
A document store based system would be the most optimal.
13. My goal is to pull the data save it to a Mongo database
And make it searchable via a web front end.
My current progress - creating a parsing script that
traverses the files and saves it to the Mongo database.
Next steps:
- Create REST interface to MongoDB server
- Create simple front end that searche using REST interface
- Release source on Github
- …
- Profit!!!