您好,欢迎来到尚佳旅游分享网。
搜索
您的当前位置:首页Wordsmith step_by_step_guide_English

Wordsmith step_by_step_guide_English

来源:尚佳旅游分享网
WordSmithToolsstep by stepVersion 5.0© 2010 Mike ScottWordSmith Tools step by step

version 5.0

by Mike Scott

© 2010 Mike Scott

ContentsITable of Contents

Foreword

0

Part IIntroductionPart IIChoosing your textsPart IIISelect the right languagePart IVConcordancing

24812

1overview ................................................................................................................................... 122making a concordance ................................................................................................................................... 133seeing the source text ................................................................................................................................... 1collocates and mutual information ................................................................................................................................... 175concordancing tagged text (1) ................................................................................................................................... 206concordancing tagged text (2) ................................................................................................................................... 21

Part VWordList

24

1overview ................................................................................................................................... 242making a word list ................................................................................................................................... 243concordancing selected words ................................................................................................................................... 274lemmatising ................................................................................................................................... 295word list statistics ................................................................................................................................... 316multi-word units ................................................................................................................................... 32

using an index ......................................................................................................................................................... 32

making a multi-w......................................................................................................................................................... 33ord wordlist

Part VIKeyWords

35

1overview ................................................................................................................................... 352making a key word list ................................................................................................................................... 353key words plot ................................................................................................................................... 374concordancing selected key words ................................................................................................................................... 38

Index40

© 2010 Mike Scott

Step-by-step guide to WordSmith

Introduction

Section

I

Introduction21Introduction

These pages are to help get you started. Screenshots take you through each stage.This is the main screen of the WordSmith Tools Controller.

It has a saying (which keeps on changing and which you can edit), three buttons for the main Tools(Concord is shown as in use), and a series of tabs. At the moment we see the main one showingthat anthony & cleopatra.txt has been chosen for Concord.

© 2010 Mike Scott

Step-by-step guide to WordSmith

Choosing your texts

Section

II

Choosing your texts42Choosing your texts

To choose text files, click the File menu in the main Controller:

When you click Choose Texts, you will see something like this:

At the left is a fairly standard text file explorer, at the right an area for Files selected.

© 2010 Mike Scott

5Step-by-step guide to WordSmithPress the browse button (files.

) to find the folder where your texts are. You need plain text (.txt)

Click the button with the two small arrows, or drag some text files from left to right. You should seesomething like this:

© 2010 Mike Scott

Choosing your texts6

At the moment WordSmith shows (in the status bar just above) that 6 have been stored. You cansee the file sizes but WordSmith doesn't (yet) know how many words there are in each text file. Wehave chosen 6 texts for Concord (see Concord just above Files selected).

Press the green button or just close the window.

© 2010 Mike Scott

Step-by-step guide to WordSmith

Select the rightlanguage

Section

III

Select the right language83Select the right language

Most examples in this guide deal with texts in English. If you want to handle texts in Chinese orsome other language, you need to choose the language in the main Controller.

If in the drop-down list (showing ) the language youneed isn't available, click Edit Languages and choose the language you want:

© 2010 Mike Scott

9Step-by-step guide to WordSmithand drag it to the right or click the button in the middle.

You'll see an option to select that language as your \"main language\" (with which WordSmith willstart up by default), or else just as an available language.

© 2010 Mike Scott

Select the right language10In this screenshot, English has been selected and some suitable choices for English have beenmade such as apostrophes allowed within a word, hyphens separating forms like self-consciousinto two words and showing that Arial 10 is the default font, etc. Finally, save your settings.

© 2010 Mike Scott

Step-by-step guide to WordSmith

Concordancing

Section

IV

Concordancing124

4.1

Concordancing

overview

A concordance looks something like this:

It's a concordance of all the occurrences of wherefore in Romeo and Juliet. There are only 5entries. The famous one comes 6,537 words (27%) into the play.

© 2010 Mike Scott

13Step-by-step guide to WordSmith4.2making a concordance

When you press the Concord button in the main Controller, a new Concord Tool opens up and willbe visible in the Windows Taskbar.

Now in Concord itself, choose File | New.

If no text files have been chosen, you are asked to choose some. Press the Choose Texts Nowbutton.

© 2010 Mike Scott

Concordancing14

Once the texts have been chosen, enter a suitable Search Word:

Here I have chosen wherefore as my search-word. Then press OK.

© 2010 Mike Scott

15

Step-by-step guide to WordSmith

The concordance lists all the examples of \"wherefore\" which had a word-separator such aspunctuation, space etc. before and after it.

Since we have now done our concordance, WordSmith now knows how many words there are ineach text file: romeo and juliet.txt has 24,275 altogether.

4.3seeing the source text

To see the source text, double-click on the line in question. Here, I clicked on the highlighted linecontaining wherefore art thou Romeo.

© 2010 Mike Scott

Concordancing16

or press F8 and the lines grow fatter:

or pull the line you're interested in wider or fatter: place your cursor at the bottom the number in theleft column, and it changes shape:

and pull it down.

© 2010 Mike Scott

17Step-by-step guide to WordSmith

You can also pull it wider by putting your cursor at the right edge, just to the left of the word Set.

4.4collocates and mutual information

Here are the collocates of AGO computed using the written section of the BNC, ordered byfrequency.

© 2010 Mike Scott

Concordancing18There are nearly 17,000 instances of AGO, and YEARS is the top collocate, found 9,000 times nearAGO. The \"Relation\" column is blank. What's needed is a way of knowing how closely each of thesecollocates of AGO is related to it. Are A, THE, WAS etc. really closely linked to AGO? If we now choose Compute | Relationships in the menu,

and select a suitable word-list to use for the comparison:

© 2010 Mike Scott

19Step-by-step guide to WordSmiththen we get the following list when sorted by clicking the Relation column:

The top items in the list now reflect much better the tendency of AGO to accompany periods of timeand numbers. [The top collocates HENSLEY and GROSS only occur 5 times each with AGO but outof small numbers altogether in the whole BNC Written.]

© 2010 Mike Scott

Concordancing204.5concordancing tagged text (1)

Probably the first thing to do if your source text is tagged, is to let WordSmith know. To do this, inthe main Controller, choose Settings | Adjust Settings

then Tags.

If you're using the British National Corpus (world edition), choose it within Custom settings asshown above.

So far, we have told the Controller that it is to ignore all tags beginning and ending with anglebrackets (< >), to translate a few entity references to symbols like % and \ and to cut out theheader of each text (up to ). That'll do for a start.

© 2010 Mike Scott

21Step-by-step guide to WordSmith4.6concordancing tagged text (2)

Now, we are going to do a concordance on a part of speech. The BNC uses mark-up like this: at the great houses

so each preposition is marked just before the preposition itself. The aim is to see all theprepositions in a selected text from the BNC. With a BNC text file chosen, type * as thesearch-word (the asterisk is needed because a word follows directly after the part-of speech tag)and press OK.

WordSmith checks whether the angle-brackets are text characters or tag-openers and -closers:

Here we answer Yes.

© 2010 Mike Scott

Concordancing22

You see the prepositions and their tags (but no other tags).

© 2010 Mike Scott

Step-by-step guide to WordSmith

WordListSection

V

WordList245

5.1

WordList

overview

A word list in WordSmith Tools looks something like this:

It shows how often each word occurs in the text files, what that is as a percent of the running wordsin the text, and how many text files each word was found in.

5.2making a word list

To make a word list, first press the WordList button in the main Controller.

When WordList starts up, choose your texts and then you will see something like this.

© 2010 Mike Scott

25Step-by-step guide to WordSmith

Here we're going to make one simple wordlist based on 8 text files from the play Romeo and Juliet,so press Make a word list now.

© 2010 Mike Scott

WordList26

The WordList tool shows us a frequency listing. The most frequent word is \"#\". There are 985 ofthese #. Whatever has happened? Well, by default # is used to represent any number such as 65,40 or $997.82. In this case we have line numbers in the source text.

Below #, the most frequent words are the, and, I to, of. Beside each one you can see howfrequent it is in the collection of 8 texts we used, the percentage of running words, and how many ofour 8 texts each word occurred in. It seems I is a top frequency word but even so was not presentin all the 8 texts.

To see the words in alphabetical order instead, click the alphabetical tab near the bottom of thewindow.

© 2010 Mike Scott

27Step-by-step guide to WordSmith

Now scroll down to wherefore. The results seem to confirm what we found when we made aconcordance.

5.3concordancing selected words

Once you have a word list on screen, you might want to see some of the words in it in theircontexts.

Select a word (or more)

© 2010 Mike Scott

WordList28e

and choose Compute | Concordance.

You will get something like this (if the original texts are still where they were when the word list wasfirst made):

© 2010 Mike Scott

29Step-by-step guide to WordSmith5.4lemmatising

To lemmatise manually, with a word list on screen,

pull it onto the line you want to join it to.

and drop it:

© 2010 Mike Scott

WordList30You will then see the totals change and the items become visible in the Lemmas column.If there are a lot, you can double-click the Lemmas column to see the details:

© 2010 Mike Scott

31Step-by-step guide to WordSmith5.5word list statistics

Press the statistics tab at the bottom of a word list,

and something like this should appear. Lots of numbers. Further down, the numbers are easier tounderstand:

© 2010 Mike Scott

WordList32

There are lots of 4-letter words in Shakespeare, it seems.

5.6

5.6.1

multi-word units

using an index

To make a wordlist with pairs or triples of words (n-grams) such as

OF THE

IN THE END

ONCE UPON A TIME

etc you will need first to compute an index file. This essentially knows the position of each separateword in your corpus.

See also : making the multi-word unit wordlist

© 2010 Mike Scott

33Step-by-step guide to WordSmith5.6.2making a multi-word wordlist

The process is explained here and what you get looks like this.

Press Ctrl/F2 to save it, and the suggested filename will be something like _index_3-5-wordclusters. It can later be opened as an ordinary wordlist.

© 2010 Mike Scott

Step-by-step guide to WordSmith

KeyWords

Section

VI

35Step-by-step guide to WordSmith6

6.1

KeyWords

overview

A key word list in WordSmith Tools looks something like this.

The key words are words which occur unusually frequently in comparison with some kind ofreference corpus.

Beside each key word there are various numbers telling you how frequent each one was in thesource text(s) and how that compares with its frequency in the reference corpus.

In the list above, based on the play Romeo and Juliet in comparison with all the Shakespeare plays,we see lots of names of the main characters, some pronouns like thou, plus theme words likelove and night.

6.2making a key word list

To make a key word list, first press the KeyWords button in the main Controller.

© 2010 Mike Scott

KeyWords36When KeyWords starts up, choose menu option File, then New and you will see something likethis.

You have to choose word lists made and saved by WordSmith Tools. You can choose the word list files by pressing this button:

The reference corpus word list is assumed to be a big one, which will help WordSmith work outwhat is unusual about the words in your chosen text(s).

Once you have chosen a word list above and another for your reference below, press Make akeyword list now. (Until you have, that button won't be enabled.) Then you will see something like this:

© 2010 Mike Scott

37Step-by-step guide to WordSmith

6.3key words plot

This is a key word plot where the text is the file a1f in the British National Corpus (BNC),compared with the whole of the BNC.

© 2010 Mike Scott

KeyWords38

You see:

each key word (KW) (these obviously have to do with international relations) a measure of its dispersion and its keyness

how many times each KW came in the text (hits). a map showing where each word came.

At the left the blue line represents the start of the text, at the right the blue line represents the end.Look at Britain, Germany, Italy and century -- these seem to come in bursts more orless three-quarters of the way through the text. China, Mao, Peking come together a bit laterin the text.

6.4concordancing selected key words

Once you have a key word list on screen, you might want to see some of the words in it in theircontexts.

Select a word (or more)

© 2010 Mike Scott

39Step-by-step guide to WordSmith

and choose Compute | Concordance. Here, the rather mysterious HAH has been chosen.

You will get something like this (if the original texts are still where they were when the word list wasfirst made):

It seems that HAH is a system for health care.

© 2010 Mike Scott

Index40Index

- C -choosing text files 4

collocates and mutual information 17Concord: nearest tag 21Concord: overview 12

concordancing on tags 20

- I -introduction 2

- K -KeyWords: overview 35

- M -making a concordance 13

- N -nearest tag 21

- S -seeing source text 15sorting tags 21

- T -tag concordancing 20

- W -WordList: overview 24

© 2010 Mike Scott

因篇幅问题不能全部显示,请点此查看更多更全内容

Copyright © 2019- shangjiatang.cn 版权所有 湘ICP备2022005869号-4

违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务