Мирский Христо : другие произведения.

06. Ideas About Browsers Searching In The Internet

Самиздат: [Регистрация] [Найти] [Рейтинги] [Обсуждения] [Новинки] [Обзоры] [Помощь|Техвопросы]
Ссылки:
Школа кожевенного мастерства: сумки, ремни своими руками
 Ваша оценка:
  • Аннотация:
    This time these are only my ideas about bettering of searching in the Internet, because there are many necessary possibilities that are missing.
    Keywords: programming, browsers searching, bettering, own ideas, in English.

      




IDEAS ABOUT BROWSERS SEARCHING IN THE INTERNET


     This time these are only my ideas, maybe even naked ideas (but nowadays the naked ideas are allowed, aren't they?) so that I want nothing from anybody. I just share my opinion. Because these are big and complicated programs, they are expert systems, and they are learning, and perform grammatical analysis in different languages, and all the time do searching through the web and actualize their tables for access and so on. Besides, I am not at all specialist in Internet, I was only programmer before about 25 years, but our time is dynamical so that I was left far behind. Despite of this there are obvious things that simply poke in the eyes if one is not prejudiced in something, if he does not defend somebody's private position, although there it can't be said that the people don't work properly. No, they work, but as if not in the right direction, do this what is easier and more impressive, not this, what is necessary. So that, ladies and gentlemen, browser specialists -- as well also clients, because when the users decide to require something is will soon emerge --, if you want listen to me then good, but if you don't want then I have fulfilled my duty.
     So, then I will begin.

     1. General impression

     The general impression when using any browser is reduced to this, that these are private companies and they try to jump with something before the others -- as, in fact, various supermarkets --, but these are usually nonessential things, this is throwing of dust in the eyes, and were by them not the concept of showing first what in a pairs (well, in a pair of hundreds, maybe) sites is said about the given request, then people would have given up to use them at all! That's how it is. I think that I don't exaggerate, they have enviable achievements, but not in the area of searching, in one old Word exist far better abilities for searching, by parts of the words, by a pattern, even for the English exist the strange search of sound-alike words.
     Well, there are reasons for this. In the Internet can't be searched when people ask, no, it is searched there all the time and are maintained tables for every word (I suppose, but how else?) and later, when necessity arises, then these tables are joined or intersected. And search by parts of the words is, in principle, silly. But, on the other hand, they search by one-letter words (say, by English "I", or Russian "ja"-I -- it's with one Cyrillic letter --, etc.), so that why not to search, for example, "multy-", or "-brev-", or "ang-", and so on, but without the hyphen, I insert it in order to stress that this isn't whole word? But the biggest difficulty for the browsers arises not when they do the search, but when they show the found information. And do you know why? I personally have begun to think about this only now, and for me it is clear that this is because in the web is not defined order relation and can't be said which site before which has to be shown, in principle! Due to this it turns that this, what any browser shows you, is ordered not in the necessary order, more so in the order in which you would have wanted it to be (although you have not told in what order you want the things, but you have also no such opportunity).
     What has to be done in such cases? As if there are only two variants: one is to cluster somehow the occurrences (by date, by languages, by countries, etc.), and choose to show only part of these groups, and the other is to introduce some counter for the priority of showing of the given site (there are not so many sites, seems to me, in any case less than all words in a given language, and there are a heap of languages, as also there can be many variants of each word). I think that the browser people use both, this and that, where the priority is computed by the number of requests to the given site, or maybe even page of the site, and also introducing a list of most important sites. They do all this, I don't say that they don't, but not enough good for the end-user. Let us look at this in more details.
     Now, when you write one only word in the window, everything goes more or less good, where "more" means, that in the beginning emerges the Wikipedia, and a pair of others, as if important sites, and what is after this is not interesting for you, and "less" means that -- well, why should be shown sites that are of no interest for you? The only reason in showing of all encountering of a given word is in this, to check what is correct, because the writing of the word might be erroneous, and this is valid also for combinations of words (like German -- to give an example from language with cases, -- "in diesem Abend" or "in diesen Abend" -- and surprisingly for me I found that here are about 100,000 occurrences of the erroneous variant --, or you search how is it more correct to express oneself in English, to say "depends on" or "depends from" -- and here are also many occurrences in the not-correct variant but it is in another meaning --, yet it is preferable not no forget the quotes). This is a very good opportunity (as by-product) but it is entirely useless to show these encounters, it is enough to look at the statistics of usage and choose what is more often used. But well, what is in addition is not so bad, after all, though the point is in this that there are not a few sites which ensure in this way place for them by such requests, and when you look there they show you various ads, so that it turns out that showing of unnecessary things serves the advertising and hinders the users, or you are caught on the bait, as is said.
     But the worst work of the browsers happens when you search for several words, because then, despite of all sorts of tricks by finding the roots of the words and missing (as a rule) of the conjunctions, i.e. in spite of the done grammatical analyses of your request, and of forming of various variants for searching, is applied, as a rule, union, OR, not intersection (AND) of the words. In this way, adding more words you don't narrow the search but on the contrary, you extend it, what contradicts to the common sense. And if you decide to write the words in quotes, for literate search, then you may miss many similar words. Where this, what the user wants, is some possibility after performing of the first search to add something more and restrict the number of occurrences, but there is almost nothing to be added, because even the language and the country does not correspond exactly to their names, this is something placed on the web in the given country, but it may be even on Swahili. All attractiveness of the browsers is bases first of all on maintaining of many big ordered lists of often met requests and on the frequency of use of some sites; the systems seem quite intelligent, but their intellect is nearly the same as the intellect of a parrot.
     So that it makes sense to share with you some of my propositions, and how they could be implemented, what has to be taken away from the existing things if it contradicts to the common sense, remains, naturally, at the discretion of specialists making this software. But there are necessary quite drastic changes, because the condition of the web is approximately on the level in the beginning of its emerging (like, say, in 1990), while the information has increased since that time surely more that 1,000 times, and what will be after a pair of decades is incomprehensible for the mind. So, yet I will not specially order my propositions but simply will express some different ideas.

     2. Reliability of the source, and other types of pages

     For me it is obvious that must be some opinion about reliability of the source, because it can't be placed on one level this, what the official instances, like state agencies and others, or also scientific organizations, express, with this what say the media (this is generally cheating, I think you have not other impressions about them, just nice deception, which is liked by the majority of readers), or also various (competing, and for this reason contradicting one another) companies (the media also contradict one another), and especially with this what says everyone who can speak (in fact, type on the keyboard), like school children, young people, pensioners, clients, at cetera. I don't say that it must not be listened to the ones and the others and the thirds, but they must be distinguished.
     What I mean is the following: must be introduced type or reliability of the site as one of three (at least) variants: a) authorized, which must apply for this, must be some indicators that are to be satisfied by them, but first of all unity and centralization of the meanings, official view to the things, at least in the framework of the state, these are official instances, and even not all of their sites, there can be unauthorized sites even of Ministries, also official academic and educational institutions, and so on, but also with single and official meaning, not one think so but another otherwise, and here, surely are the national variants of Wikipedia; b) companies and any organizations, media, societies, literary sites, and so on, which prove their belonging to this category with this, that they are registered as legal entities; and c) physical persons, i.e. everybody who wants (Sulyu and Pulyu, as we in Bulgaria say), who prove nothing, and if some source can prove nothing it is included in this category (say, blogs, where can be added personal meanings, talks and chats, questions and answers open to everybody, etc.). Then the search must be conducted by default only for authorized instances (and such ones have to be not more than one percent, I suppose), and show only the statistics of encounters for the second and third category.
     Only in this case the Internet can be used as alternative of former encyclopedias, for education, not for deluding of easily gullible. But to take such measures in one only country there is simply no sense, here is needed the most difficult, united decision of the whole Internet, and it just has not, as I think, global administrative body. Hence is has to be built, to UN, maybe.
     Further in necessary stricter monitoring of the languages and countries on each site, i.e. is necessary to introduce such parameters in the beginning of each page. For example, I am writing this material in Russian and place it in Russia, but I might place it also in another country and again in Russian, or it maybe (like it really happens by me) that I place something in Bulgarian, or English, or German, et cetera, on a Russian site; this is valid also for any ads, because despite the efforts of many computerized translators the language is still the most important parameter for each textual material. In reality on can give credence to Internet only as to the date of appearance of the things, here everything is precise, but otherwise all is conditional.

     3. Search in vicinity

     As I said, I am not specialist in the field of Internet (only somewhere around), but I have not heard that was spoken about vicinity search, and without it to conduct more or less good search by more than one word happens to be quite unsuccessful, due to the lack of order relation, and such search introduces some order. What I mean is the following: introduction, say via square brackets, of consequence of words, which without quotes will be multiplied in all case and other forms, but with them will not, which will be searched on distance from one another, or on maximal distance if there are more than two words, where the very amount of this distance (in words) will be given by the last parameter (or maybe one more parameter for the whole group). By default has to be understood 3 words, or by 2 on the left or on the right, but not more than 5 for the whole group. For example

     [Myrski "Chris" 2]
or also
     [population number world 3 7]
or also
     [ [ "Chris" Myrski 1] [religion communism 2] Bulgarian 100 ]

and other variants, which surely are not so difficult, so that even housewives, as is said, to be able to write such requests to the browser; if there are not quotes is supposed that the word can be varied giving, say, "Myrski's (but if this is in another language, then can be much more variants, like -- given in English -- "Myrskij"), "The communism as religion" (or "Religion of the communism", or "communist religion", etc.), and somewhere the word Bulgarian (if needed can be given also 10,000). The basic work of the browsers during the search will not be affected by this, but will be changed the way of ordering or the results when they are shown, and the results can be reduced to only one (i.e. to copies of this work on various sites). In addition to this in this way can be given also wider requests, which can further be narrowed changing some of the numbers, or adding new words, and this is very significant, because I have said that one must decrease the number of occurrences, not increase them.

     4. Search by important parameters of the page

     Now, when one looks at the abilities of contemporary browsers, one can come to the conclusion that people all the time have done so like the browsers do today, yet this is not at all so. Libraries exist since thousands of years, but nowhere and never was possible to perform search by the occurrences of some words (like say, communism, party, Government, duty of the citizens, and so on)! This, what was possible to be done, and what can be done also today in every library, is to perform search by the author, by the title, in Cyrillic or in Latin (or, maybe, in Arabic), by index on what bookshelf they are placed, and also by thematic catalog (when you don't know exactly the author or the title, or you are interesting in several similar books). That's how it is. Not by this can you find in them the word, I beg to be excused by the readers, "arse" (or only "ass"). I agree that the new not always must look at the old, but it must somehow be coordinated with it, one should not reject the whole history before us and begin to live anew (as many of the young, probably, think). When till now was worked so, then such possibilities have to be available also today, and this, that something else can also be allowed -- well, so much the better, but vertical upgrading with preserving of all old features. This have invented not I, this is the right way of working in any area.
     Well, as to the author and the title of the work, then they, surely will be found if are searched all possible words (although will be found also many citations of them, what isn't exactly the same), but where remain the keywords, by which, in fact, has to be performed the search (not by the use of conjunctions and occasional words), and the thematic? Here I also am not a coryphaeus but there exist library education and the people there know these things, these are elementary truths for them, it can't be otherwise. As to the keywords, then now all know this word and use it (as also I on various sites), but this is not correct, this is amateurism, everybody puts whatever keywords he /she wants, what is not the right way, though it may be allowed (for lack of a better). And then don't forget that if these keywords are on the language of narration of the document then there can't be made difference between their occurrences as words in the text, and as keywords. For this reason they must be preceded by something what must not be detached from them, say the word "Index", or "Theme" (like ThemeDemocracy, what is my favorite theme). This now is better, but it is not enough.
     Ah well, and how is it right, can exclaim some of the readers, and then I will say again: ask the specialists in library matters. They must tell you that there have to be established indexes or thematics for all libraries (and the Internet is one enormous library), which must even be written in the beginning of the books, on the second page (like I, if am not wrong, have seen on some American books, that they are cataloged in their catalog). So that here, I repeat, must exist some administrative body for the entire world, that has to represent the Internet, for example, "Commission on Internet by UN", and they have simply to work out the necessary requirements and in one language, let this be the English for the time being (although it leaves a lot to be desired). In general outlines have to be approved special tables with the thematics of all possible areas, for which there have to be translations in all possible languages and the way for calling them in every browser, in order to copy the right words, as well also some standards for giving the name of the author, the title, short abstract, such things. But if I begin now to explain in details what has to be done I can ... deprive good specialists of their deserved earnings, isn't it so? Well, jokes aside, but this is not a field for enthusiasts.
     Still, I will risk to propose you one brilliant idea in the next subsection, because otherwise I will not be Myrski, right?

     5. Introduction of at least one special character as letter in all alphabets

     Here is no need to search long, this is the known underscore ("_"). It is good with this, that it as if is a hyphen, but is not sign for splitting of the words and is used on the contrary, for joining of several words. In this case if some word will begin even only with it, then this will distinguish it in all languages, but it is far better to write, say, Ind_word, or I_word, where the word "word", obviously, signifies any word in any language. I personally use the second variant in my unique book Urrh, for to allow to be made search of only in this way marked words. Similarly can be entered a pair of other special designations, like: Au_name, or Tit_title or The_theme. And there might been possible also to insert several such sing, if there are sub-themes to the given theme. You see how elementary everything is.
     But in order that all were able to use even at once this proposition is necessary very slight effort on the part of the developers (and maintainers of these software products), it is necessary only that they processed this symbol with all alphabets (even with Arabic or Swahili), not to reject it and not to take it for delimiter on which the previous word ends. Then can be made also this, what I mentioned in the beginning, that Word can, but in the Internet this is impossible, namely: to conduct search till the every character (say, to write only "The_math*" and search for all possible variants like "mathematics", "mathematical", only "math", and others). Such luxury can be allowed for the whole web, because this will not be word from whatever language, it will be met tens of thousands (if not millions) times rarely than all this listed variants of the words; it is necessary only for all words where the character "_" is met to maintain indexes until every possible symbol of this combined word.
     Well, I think to finish with this, but, as you see, there is what to be wished from all browsers, and not only in regard of their colour decoration, or in all complicated functions, but in the very mechanisms of searching on the web, otherwise there is no real sense in showing all possible millions and milliards of occurrences of some required string of characters.

     Dec 2014







 Ваша оценка:

Связаться с программистом сайта.

Новые книги авторов СИ, вышедшие из печати:
О.Болдырева "Крадуш. Чужие души" М.Николаев "Вторжение на Землю"

Как попасть в этoт список

Кожевенное мастерство | Сайт "Художники" | Доска об'явлений "Книги"