Columbia State Community College Finney Memorial Library
Find Books Find Articles Find Help Research Assistance Faculty Links

Why Google is not enough – using article databases

Defining the “Internet”

The Internet has been defined at “a network of networks”. It’s a system of computers linked together by phone lines and cable systems. Many different kinds of information are stored on these computers, and thus available over the Internet. So many different kinds of research can be done there.

Some of this information can be found by using a “search engine” website, such as Google, Yahoo, Lycos, Ask Jeeves, etc., or by typing in a known URL (Universal Resource Locator, or Internet address.) The quality and accuracy of information found on these free websites varies greatly, as will be explained below.

Other information you find on the Internet is the electronic version of print resources, such as encyclopedias and other reference books, and periodical indexes (searchable lists of articles from newspapers, popular magazines, and scholarly journals.) Generally speaking, a much larger percentage of this information is going to be accurate. This information is stored in what we call article databases. Access to these databases seems free to you, but that’s because you are a student at Columbia State. Your tuition, fees, and taxes pay for the library to give you access to these databases.

That’s not to say that you should never use a search engine, or that you should never believe information you see on a website you found using a search engine. Which approach you use depends on the type of research you are doing. To decide how to approach your research, let’s first talk a little bit about how the two types of Websites, and how they are created.

What do Search Engines do?

(This section adapted from Finding Information on the Internet: A Tutorial)

Search engines don’t search the Internet directly. Each one searches lists of web pages chosen from all the web pages in existence. So when you search the Web with a search engine, you are using information that’s not completely up to date, although the webpages you get when you click on a link in your search are the up-to-date pages.

Computer programs called spiders create these lists, usually referred to as “indexes”. The spider programs don’t “think,” and they can’t use judgment to "decide” how to look up all the websites on a specific subject, or figure out different ways to look up a subject to be sure they have found all the websites pertaining to it. Remember, computers are not smart, they are just dumb very fast.

So how do these programs find the websites they include in an index? By following the links in websites already in the index. The index is built out of the structure of the Web itself, without any judgment as to the validity or usefulness of the websites listed.

After spider programs discover webpages that are not already in their indexes, another program categorizes the information in the page and stores it in the search engine’s index files. When you use search engines, whether you do a keyword search or use the engine’s directory search, you are searching these index files.

How do websites get on the Internet?

A website is a collection of files, stored on a computer connected to the Internet. These files can be many types. Some are files of text (and usually images) designed to be used as is. You open them up, read them, and close them without making any changes to them. These are called “static” pages, because they aren’t created or changed by the user. Others are databases of information, such as the search engine’s indexes, the library’s catalog database which lists everything the library owns, or the reference or periodical indexes mentioned above. The pages where you input your search terms are static pages, but the pages of results are temporary, or “dynamic” pages. These dynamic pages are not stored on the website’s computers, so the spiders cannot find them.

No Internet "police"

A website can be put up by any person or organization that can afford to buy space on an Internet-connected computer. Anybody can put up websites that claim anything they like. There is no organization that polices the Internet, no Board of Directors that evaluates which websites relay accurate information, no editors to check out the claims made on the Web. This is the biggest problem with using search engines to do research. Remember our discussion of spider programs above? These programs make no judgments about the quality or accuracy of the websites they index.

Libraries and Materials Selection

On the other hand, lots of human judgment goes into the materials made available to you through a library, whether these are “hard-copy” (paper, videotape, CD or DVD, etc) or electronic.

First, the editorial staff of the company producing the material decides whether they want to publish it or not. Now, admittedly, not every publisher considers the accuracy of the materials the top criteria for producing it. But as savvy media consumers, we have a feel for which publications or publishers are generally trustworthy. Most people take the information published by, say, The National Enquirer, with a big grain of salt. Time and Newsweek magazines are more reliable, but because of their general reader audience, they may publish some generalizations that an expert in that field would object to. The New England Journal of Medicine, since it’s written for professionals in the medical field, is more reliable yet.

(We are not considering here the political bias or social agenda of any given publication – that come outside the scope of this essay.)

In a library, the people buying the books and subscribing to the periodicals add another layer of human judgment in deciding what to include. The library staff also uses judgment in deciding which reference databases and periodical indexes to provide to their patrons. We look at the reputation of the publisher and their track record in the past and we read reviews printed in reviewing publications with a good track record for accuracy. We also consider such things as the appropriateness of the material to our patrons, and, in the case of an academic library, whether our school teaches about the subject covered in the publication.

Based on all these factors, we choose whether or not to add materials to our collection. For example, the Columbia State library catalog lists eight times as many books on nursing as it does on astronomy, since we don’t teach astronomy here, but we have a big nursing program. And the Columbia State library doesn’t provide access to periodical databases designed for children the way your local public library does.

So if you are doing academic research, or if you are looking for information that was published in hard copy at some point, you want to use the library’s article databases, or its online catalog. And please do come into the library, the first few times you do this. Yes, Columbia State gives you access from off campus to the online catalog and to our article databases. And it may be appealing to “do research in your bunny slippers”, but when you are first learning how to do electronic research, having a librarian help you navigate all the different sources can save you a lot of time and effort. If the person behind the reference desk looks busy, it’s just because we keep busy until someone comes along to fulfill our primary purpose in sitting there -- to help our patrons.

And we don’t just help with the catalog and the article databases. Depending on your research, you may also want to use websites. But, since the search engines are not evaluating what is listed in their indexes and directories, you will need to do some evaluation of these sources yourself. We can also help you here.

Evaluating websites

The quickest way to evaluate a website is to figure out the author of the site, and the credentials of that author.

One way to do this is to look at the URL, or Internet address. Although there isn’t anyone out there policing the Internet, there is an organization responsible for assigning URLs. This organization has some rules for assigning URLs. Let’s look at a few URLs and decode them.

http://www.columbiastate.edu/index.htm    The URL for Columbia State’s homepage

http://www.mtv.com The URL for the MTV homepage

Remember, websites are collections of files. Just as the hard drive of your computer has folders, the computer on which these files are stored has directories. So just as an assignment you have saved on the hard drive may have a path of C:\My Documents\CIS 013\First Assignment, website addresses have pathnames as well. Many times you will see a homepage with an address that ends in “index.htm” or index.html”, such as Columbia State’s homepage. Other websites don’t use that naming convention.

http://www.columbiastate.edu/library/ Tutorials/tutorials_index.htm

 Here’s a much longer URL. It’s on the Columbia State website, in the “library” directory, which has in it a directory named “Tutorials”, and the file it is pointing to is “tutorials_index.htm”.

(notice that, unlike Microsoft Windows pathnames and filenames, URL’s don’t have spaces in them).

Here are some more URLs:
 

http://www.irs.gov/pub/irs-pdf/f1040.pdf

The URL for the IRS 1040 tax form

http://www.donhr.navy.mil/Jobs/default.asp

The URL for the Department of the Navy job listings

http://www.pbs.org/wgbh/masterpiece

The URL for PBS’s Masterpiece Theatre

http://www.mtv.com/onair/realworld/season14

The URL for MTV’s Real World San Diego (Season 14)

http://www.earthlink.net/about/policies/privacy.html

The URL for the privacy policies of Earthlink, an Internet Service Provider

To tell something about who is responsible for a website, find the first part of the URL (the domain). Look at the 3-letter code at the end of the domain name (the extension).

http://www.irs.gov/pub/irs-pdf/f1040.pdf

Websites that are owned by governmental entities in the United States end in .gov .

These include such useful sites as the IRS, the National Library of Medicine (http://www.nlm.nih.gov) and the Federal Trade Commission (http://www.ftc.gov/bcp/menu-internet.htm) is the URL for their webpages on E-commerce and the Internet)

Not all of these are Federal sites – for example, many (but not all) the official websites for U.S. state governments use a URL formed by “www” and a dot, the state’s name and a dot, and the extension “gov” – www.tennessee.gov, www.alabama.gov, etc.

http://www.donhr.navy.mil/Jobs/default.asp

Websites that are owned by parts of the United States military end in .mil .

After these two designations, it gets a little fuzzier.

http://www.earthlink.net/about/policies/privacy.html

Internet networks are allowed to use the ending of .net. However, since they are in the business of selling space on their computers to other people and organizations, you could easily see a URL that looks like this:

http://home.earthlink.net/~tsmith12/index.html

This is the homepage for a (hypothetical) customer of the Earthlink company.

 The same thing can be true of the .com ending, used for commercial sites -- although many .com addresses are online stores, or sites belonging to for-profit entities:

 www.hondacars.com            www.amazon.com                 www.mtv.com

Many of the Internet Service Providers, who sell or give space to people wanting to post websites, also end their domain names with .com -- so these hypothetical webpages might have addresses like this:

 http://www.aol.com/members/~jjones/homepage.html

 http://www.TheBestISP.com/~tsmith/myblog

 A similar situation exists with the .edu code –

http://www.columbiastate.edu/links.htm is the URL for the “Important Links” page of the Columbia State website. It’s an official page of the college. If a faculty member put up a website, its URL would look something like this: http://www.columbiastate.edu/tsmith/homepage.htm

and some educational institutions allow their students to put up websites as well. So just because a URL’s domain ends in .edu, don’t assume it’s an official page of the college or university.

 Just the opposite can be true of the .org extension.

 http://www.pbs.org/wgbh/masterpiece/

Websites that are owned by (usually) non-profit organizations end in .org. However, if an organization doesn’t register its own domain name, it could have a URL like this:

 www.maurywebpages.com/mcas.htm   The URL of the Maury County Animal Shelter

So looking at that 3-letter code isn’t always foolproof. But it is a start. Keep it in mind as you look over the website – look for a link to “about us” or other kinds of information identifying the author of the page. If you can’t figure out who the author is, beware! If the author doesn’t identify him or herself, it’s usually because they don’t want to stand behind what they have written.

When to use which

There are many times when using a search engine and websites is a perfectly legitimate way of looking up something.

The United States government is disseminating a lot of its information over the Internet. A website ending in .gov or .mil is just as legitimate as the paper copy of the same information. The same can be said for the official websites of smaller government entities:

www.tennessee.gov -- State of Tennessee official website
www.maurycounty-tn.gov - Maury County official website

Are you looking up information on a medical condition? A museum? A special interest association? Many of these have websites, and the information published here is as accurate as it would be if it had been published on paper:

www.diabetes.org - American Diabetes Association
www.memphisrocknsoul.org -
Memphis Rock 'n' Soul Museum
www.nascar.com
- Official NASCAR website

Academia is publishing online as well. You may not always be able to access the webpages of a professor at another institution, but if you can, it may have valuable information. Be aware, however, that information published on a campus website may very well not go through as strict a review process as an article published in an academic journal.

Popular Culture information is frequently published first on the Web. Again, keep in mind that this information will be of varying reliability. Information about your favorite TV show may be more reliable if it appears on the TV network’s “official site” for the show.

Much consumer information is available over the Internet. From designing your new car to finding the weekly sales at the local grocery, it’s available online. However, you again need to be careful when using these sites – look for the privacy and security statements before entering any personal information.

(Did you know that for many years, nothing was allowed to be sold over the Internet? Hard to believe, isn’t it?)

However, if you are doing any kind of in depth research on historical events, scientific or medical information, or any kind of academic research, do not rely on freely available websites alone. Keep in mind that these websites are primarily self-monitored, and the information on them is only as reliable as its source. Information in article databases has been through varying degrees of editorial scrutiny and review.

And of course, if you are doing an academic assignment, follow whatever guidelines for acceptable sources given you by your instructor!

How Websites and article databases come together

A little bit earlier I said that the library staff could help you evaluate websites. There are several ways we do this. One is to simply recommend a website to you, that we know about from our professional reading and research. Another is to point you to a reliable directory of websites, created by humans using human judgment. Some of these are:

INFOMINE  http://infomine.ucr.edu/
Academic Info  http://www.academicinfo.net/index.html
Best Information on the Net http://library.sau.edu/bestinfo
Internet Public Library http://www.ipl.org
Librarians' Index to the Internet  http://sunsite.berkeley.edu/InternetIndex
Digital Librarian http://www.digital-librarian.com

(you will notice that most of these were compiled by libraries)

And another way is to point you to an article database that includes listing for reliable websites. Two of these are the Encylopaedia Britannica and the SIRS Knowledge Source database.

Library resources and Internet websites –
complementary sources for information

When doing research, it’s not a one-or-the-other proposition. The Internet is merely a communication medium. Some of the information on it is perfectly legitimate, just as if it had been published in paper format by a reliable publisher. Other websites are no more substantial than a flyer photocopied at your local copy shop. Consider the source, and the purpose for your research. And please let the library help – that’s what we’re here for!

 For more information:

 See Columbia State’s “Doing Internet Research" webpage.

 

author: Jacklyn Egolf, 2004

Staff Directory Sites & Centers Columbia Campus
Library Hours
Library Homepage
   

Columbia State Community College is a Tennessee Board of Regents Institution
and is an affirmative action and equal opportunity employer committed to
the education of a nonracially identifiable student body. © Copyright 2007